<a href="https://colab.research.google.com/github/harmonydata/harmony/blob/main/Harmony_example_walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![The Harmony Project logo](https://raw.githubusercontent.com/harmonydata/brand/main/Logo/PNG/%D0%BB%D0%BE%D0%B3%D0%BE%20%D1%84%D1%83%D0%BB-05.png)

<a href="https://harmonydata.ac.uk"><span align="left">🌐 harmonydata.ac.uk</span></a>
<a href="https://www.linkedin.com/company/harmonydata"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/linkedin.svg" alt="Harmony | LinkedIn" width="21px"/></a>
<a href="https://twitter.com/harmony_data"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/x.svg" alt="Harmony | X" width="21px"/></a>
<a href="https://www.instagram.com/harmonydata/"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/instagram.svg" alt="Harmony | Instagram" width="21px"/></a>
<a href="https://www.facebook.com/people/Harmony-Project/100086772661697/"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/fb.svg" alt="Harmony | Facebook" width="21px"/></a>
<a href="https://www.youtube.com/channel/UCraLlfBr0jXwap41oQ763OQ"><img align="left" src="https://raw.githubusercontent.com//harmonydata/.github/main/profile/yt.svg" alt="Harmony | YouTube" width="21px"/></a>

# Harmony walkthrough - Python library

You can run this notebook in Google Colab: <a href="https://colab.research.google.com/github/harmonydata/harmonydata/blob/main/Harmony%20example%20walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Harmony is a data harmonisation tool that uses natural language
processing to recognise where questions in questionnaires are semantically similar. Harmony is a collaboration project between [Ulster University](https://ulster.ac.uk/), [University College London](https://ucl.ac.uk/), the [Universidade Federal de Santa Maria](https://www.ufsm.br/), and [Fast Data Science](http://fastdatascience.com/).  Harmony is funded by [Wellcome](https://wellcome.org/) as part of the [Wellcome Data Prize in Mental Health](https://wellcome.org/grant-funding/schemes/wellcome-mental-health-data-prize).

This walkthrough lets you compare items where questions have already been extracted from the PDFs. If you want to process PDFs, you also need to install
Java and [Apache Tika](https://tika.apache.org/) - see the Harmony README.

![my badge](https://badgen.net/badge/Status/In%20Development/orange)

[![PyPI package](https://img.shields.io/badge/pip%20install-harmonydata-brightgreen)](https://pypi.org/project/harmonydata/)


## Install the Harmony Python library from Pypi

In [1]:
!pip install sentence-transformers



In [2]:
!pip install harmonydata



## Download example questionnaires

In [3]:
import harmony
harmony.download_models()

Downloading (…)0fe39/.gitattributes:   0%|          | 0.00/968 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)83e900fe39/README.md:   0%|          | 0.00/3.79k [00:00<?, ?B/s]

Downloading (…)e900fe39/config.json:   0%|          | 0.00/645 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/471M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading unigram.json:   0%|          | 0.00/14.8M [00:00<?, ?B/s]

Downloading (…)900fe39/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

Downloading spaCy models to /root/harmony.
Set environment variable HARMONY_SPACY_PATH if you want to change model file location.
Error: Path already exists on your computer:  /root/harmony/harmony_spacy_models
Exiting spaCy model downloader.
Run download_models(True) to force redownload.


In [4]:
instruments = harmony.example_instruments["CES_D English"], harmony.example_instruments["GAD-7 Portuguese"]
questions, similarity, query_similarity, new_vectors_dict = harmony.match_instruments(instruments)

See the questions

In [6]:
for q in questions:
  print (q.question_text)

I was bothered by things that usually don’t bother me.
I did not feel like eating; my appetite was poor.
I felt that I could not shake off the blues even with help from my family or friends.
I felt I was just as good as other people.
I had trouble keeping my mind on what I was doing.
I felt depressed.
I felt that everything I did was an effort.
I felt hopeful about the future.
I thought my life had been a failure.
I felt fearful.
My sleep was restless.
I was happy.
I talked less than usual.
I felt lonely.
People were unfriendly.
I enjoyed life.
I had crying spells.
I felt sad.
I felt that people dislike me.
I could not get “going.”
Sentir-se nervoso/a, ansioso/a ou muito tenso/a
Não ser capaz de impedir ou de controlar as preocupações
Preocupar-se muito com diversas coisas
Dificuldade para relaxar
Ficar tão agitado/a que se torna difícil permanecer sentado/a
Ficar facilmente aborrecido/a ou irritado/a
Sentir medo como se algo horrível fosse acontecer


See the similarity matrix

In [10]:
similarity

array([[ 1.        ,  0.31365009,  0.3432307 , -0.26082838,  0.42788809,
         0.34054826, -0.30748931, -0.18449384, -0.25914567,  0.31232794,
         0.28057174, -0.28101036,  0.48577073,  0.27214031, -0.28000395,
        -0.1989061 ,  0.28694488,  0.31094229,  0.37545967,  0.28829142,
         0.33788017,  0.44290319,  0.43870789, -0.26580209,  0.38783192,
         0.53001139,  0.25845853],
       [ 0.31365009,  1.        ,  0.32531273, -0.38449687, -0.39382816,
        -0.43607552, -0.44434263, -0.23701805, -0.46996568, -0.36608223,
        -0.34493578, -0.29561759,  0.36299874,  0.38550648, -0.20762318,
        -0.30174484, -0.41118521, -0.42717949, -0.3394137 ,  0.37951918,
        -0.31706226,  0.11435866, -0.11078685, -0.19496469, -0.25027669,
        -0.1438235 , -0.24803395],
       [ 0.3432307 ,  0.32531273,  1.        , -0.45727841,  0.36003784,
         0.44618643, -0.42400636, -0.30709133, -0.40909897,  0.34075577,
        -0.22367309, -0.39270998, -0.22370194,  0.3298