# Use case: Finding similar analyses based on brain region and cell type

This is the code to find all similar analyses that are from the same specie, brain region and cell types.

The result is also stored in a csv for use at the website.

If you have cloned this project, the generated similarity file is already created and all you need to run is the `1. Create the Rodent Basal Ganglia Graph` notebook on the start page.

In [64]:
from neo4j import GraphDatabase, basic_auth
from dotenv import load_dotenv
import os

load_dotenv()

neo4jUser = os.getenv("NEO4J_USER")
neo4jPwd = os.getenv("NEO4J_PASSWORD")
neo4jUrl = os.getenv("NEO4j_BOLT")

driver = GraphDatabase.driver(neo4jUrl,auth=basic_auth(neo4jUser, neo4jPwd), encrypted=False)

## Create the graph projection

Create a projection of analyses connected with brain region and cell type, per specie

For simplicity, we first add a special named relationship between the analyzes and brain region, cell type and specie.
This relationship is cleaned up after the algorithm is complete.

In [65]:
with driver.session() as session:
    session.run("""
        MATCH (n:Analysis)-->(:DataType)-->(:RegionRecord)-[:PRIMARY_REGION]->(b:BrainRegion)
        MERGE (n)-[:NODE_SIMILARITY]->(b)
    """)
    session.run("""
        MATCH (n:Analysis)-->(c:CellType)
        MERGE (n)-[:NODE_SIMILARITY]->(c)
    """)
    session.run("""
        MATCH (n:Analysis)-->(:Specimen)-->(s:Specie)
        MERGE (n)-[:NODE_SIMILARITY]->(s)
    """)
    session.run("""
        MATCH (n:Analysis)-->(s:NeuralStructure)
        MERGE (n)-[:NODE_SIMILARITY]->(s)
    """)



In [66]:
with driver.session() as session:
    res = session.run("""
        CALL gds.graph.create(
            'analyses', 
            ["CellType", "BrainRegion", "Specie", "Analysis", "NeuralStructure"], 
            'NODE_SIMILARITY'
        )
    """)
    for rec in res:
        print("Created projection of", rec["nodeCount"], "nodes")

Created projection of 875 nodes


## Run the similarity algorithm and store the results

### Analyses

Runs node similarity on the projection `analyses` and stores the result in `Data/csvs/basal_ganglia/regions/analysis_similarity.csv`.

In [67]:
import pandas as pd

similarity_rows = []
with driver.session() as session:
    res = session.run("""
        CALL gds.nodeSimilarity.stream(
            'analyses',
            {
                degreeCutoff: 3,
                similarityCutoff: 1.0
            }
        )
        YIELD node1, node2, similarity
        RETURN gds.util.asNode(node1).id as id1, gds.util.asNode(node2).id as id2, similarity
        ORDER BY id1
    """)
    for record in res:
        similarity_rows.append([record["id1"], record["id2"], record["similarity"]])


    
pd.DataFrame(similarity_rows, columns = ["id1", "id2", "score"])

Unnamed: 0,id1,id2,score
0,101,550,1.0
1,101,593,1.0
2,101,498,1.0
3,101,102,1.0
4,102,550,1.0
...,...,...,...
3421,91,292,1.0
3422,91,290,1.0
3423,91,288,1.0
3424,91,78,1.0


In [68]:
# Stores the result in a CSV file for the genereal database import
_.to_csv("..\Data/csvs/basal_ganglia/regions/analysis_similarity.csv")
_.to_csv("..\Data/csvs/basal_ganglia_valid_ascii/regions/analysis_similarity.csv")
print(_["id1"].nunique(), "unique analyses connected to", _["id2"].nunique(), "distinct analyses")

517 unique analyses connected to 405 distinct analyses


We can also store the relationships in the data base for example for visualization. Below er create the relationship `NODE_ANALYSES_SIMILARITY`.

In [69]:
with driver.session() as session:
    res = session.run("""
        CALL gds.nodeSimilarity.write(
          'analyses',
          {
            degreeCutoff: 4,
            similarityCutoff: 1.0,
            writeRelationshipType: 'NODE_ANALYSES_SIMILARITY',
            writeProperty:'score'
          }
        )
    """)
    for rec in res:
        print("Wrote", rec["relationshipsWritten"], "relationships")

Wrote 3426 relationships


## Clean-up
We remove the projected graphs, and delete the created relationship `SIMILARITY_ALGORITHM`. The relationship `NODE_ANALYSES_SIMILARITY` will not be deleted until the end.

In [70]:
with driver.session() as session:
    session.run("call gds.graph.drop('analyses')")
    session.run("""
        MATCH ()-[r:NODE_SIMILARITY]-()
        DETACH DELETE r
    """)

# Use case: Can we say anythong about the methods?

*Is it possible to say anything about the methods the researchers used in the experiments when counting a cell type in a specific region that correlates to the result?*

This part tries to answer the stated question.

From the previous step, we have all nodes that have same Object of interest, cell type, specie and brain region connected with the relationship `NODE_ANALYSES_SIMILARITY`.

Next we define communities based on this. That will create a community for each Analysis. We write this back into the graph database. Next we add relationships to the methods called `USE_CASE`. 


In [80]:
with driver.session() as session:
    res = session.run("""
        CALL gds.graph.create.cypher(
            'analyses', 
            'MATCH (n:Analysis {dataType: "Quantitation"})-[:NODE_ANALYSES_SIMILARITY]-() WITH DISTINCT n as nodes UNWIND nodes as x RETURN id(x) as id',
            'MATCH (n:Analysis {dataType: "Quantitation"})-[:NODE_ANALYSES_SIMILARITY]-(m:Analysis {dataType: "Quantitation"}) RETURN id(n) AS source, id(m) AS target'

        )
    """)
    for rec in res:
        print("Created projection of", rec["nodeCount"], "nodes")

Created projection of 499 nodes


In [81]:
import pandas as pd

louvain_table = []
with driver.session() as session:
    res = session.run("""
        CALL gds.louvain.stream('analyses')
        YIELD nodeId, communityId
        RETURN communityId AS louvainId, COUNT(DISTINCT nodeId) AS members
        ORDER BY members DESC
    """)
    
    for rec in res:
        louvain_table.append([rec["louvainId"], rec["members"]])

pd.DataFrame(louvain_table, columns=["Louvain Id", "Size"])

Unnamed: 0,Louvain Id,Size
0,134,72
1,208,26
2,166,24
3,384,19
4,265,16
...,...,...
85,405,2
86,408,2
87,447,2
88,77,1


In [82]:
with driver.session() as session:
    session.run("""
        CALL gds.louvain.write(
          'analyses',
          {
            writeProperty: 'louvainAnalyses'
          }
        )
    """)

    session.run("call gds.graph.drop('analyses')")

In the largest cluster, all nodes are not connected to each other, but because the similarity is transitive, as the similarity cutoff is set to 1.0, this does not matter. The rest of the communities are complete graphs.

In [84]:
with driver.session() as session:
    ## data types
    #session.run("CREATE (:AnalysisDataType {id: 1, name: 'Quantitation'})")
    #session.run("CREATE (:AnalysisDataType {id: 2, name: 'Distribution'})")
    #session.run("CREATE (:AnalysisDataType {id: 3, name: 'Morphology'})")
    
    #session.run("""
    #    MATCH (n:Analysis)
    #    MATCH (m:AnalysisDataType)
    #    WHERE n.dataType = m.name
    #    MERGE (n)-[:USE_CASE]->(m)
    #""")

    ## brain region
    session.run("""
        MATCH (n:Analysis)-->(:DataType)-->(:RegionRecord)-[:PRIMARY_REGION]->(b:BrainRegion)
        MERGE (n)-[:USE_CASE]->(b)
    """)
    ## Specie
    session.run("""
        MATCH (n:Analysis)-->(:Specimen)-->(s:Specie)
        MERGE (n)-[:USE_CASE]->(s)
    """)
    ## Microscope
    session.run("""
        MATCH (n:Analysis)-->()-->(m:Microscope)
        MERGE (n)-[:USE_CASE]->(m)
    """)
    ## Reporter
    session.run("""
        MATCH (n:Analysis)-->(:ReporterIncubation)-->(r:Reporter)
        MERGE (n)-[:USE_CASE {strength: 1}]->(r)
    """)
    ## CellularRegion
    session.run("""
        MATCH (n:Analysis)-->(:DataType)-->(r:CellularRegion)
        MERGE (n)-[:USE_CASE]->(r)
    """)
    ## Software
    session.run("""
        MATCH (n:Analysis)-->(:DataType)-->(s:Software)
        MERGE (n)-[:USE_CASE]->(s)
    """)
    ## RegionZone
    session.run("""
        MATCH (n:Analysis)-->(:DataType)-->(s:RegionZone)
        MERGE (n)-[:USE_CASE]->(s)
    """)
    ## SectioningInstrument
    session.run("""
        MATCH (n:Analysis)-->(s:SectioningInstrument)
        MERGE (n)-[:USE_CASE]->(s)
    """)

     ## CellType
    session.run("""
        MATCH (n:Analysis)-->(c:CellType)
        MERGE (n)-[:USE_CASE]->(c)
    """)
    
     ## CellType
    session.run("""
        MATCH (n:Analysis)-->(c:NeuralStructure)
        MERGE (n)-[:USE_CASE]->(c)
    """)


Now, with the relationships in place, we create a graph projection for each community and run the node similarity algorithm again to see if there are any analyses within a community that is similar

In [90]:
def get_method_similarity(degreeCutoff, simCutoff):
    tables = {}
    with driver.session() as session:
        session.run("""
            MATCH ()-[r:NODE_ANALYSES_SIMILARITY]-()
            DETACH DELETE r
        """)
        
        # create projection for each community
        for row in louvain_table:
            
            if(row[1] <= 1):
                continue
            community_id = row[0]
            
            name = 'analyses-' + str(community_id)
            res = session.run("""
                CALL gds.graph.create.cypher(
                    '%s',
                    'MATCH (a:Analysis)-[:USE_CASE]->(m) WHERE a.louvainAnalyses = %d WITH collect(a)+collect(m) as nodes UNWIND nodes as x RETURN DISTINCT id(x) as id', 
                    'MATCH (a:Analysis)-[:USE_CASE]->(m) WHERE a.louvainAnalyses = %d RETURN id(a) as source, id(m) as target'
                )""" % (name, community_id, community_id))

            #for rec in res:
            #    print("Created projection for", community_id, "with", rec["nodeCount"], "nodes")

            res = session.run("""
                CALL gds.nodeSimilarity.stream(
                    '%s',
                    {
                        degreeCutoff: %d,
                        similarityCutoff: %d
                    }
                )
                YIELD node1, node2, similarity
                RETURN gds.util.asNode(node1).id as id1, gds.util.asNode(node2).id as id2, similarity
                ORDER BY id1
            """ % (name, degreeCutoff, simCutoff))
            
            count = 0

            
            for record in res:
                if(count == 0):
                    tables[community_id] = []
                count+=1
                tables[community_id].append([record["id1"], record["id2"], record["similarity"]])

            ## Store the NODE_METHOD_SIMILARITY relationship between the analyses within the group:
            session.run("""
                CALL gds.nodeSimilarity.write(
                    '%s',
                    {
                        degreeCutoff: %d,
                        similarityCutoff: %d,
                        writeRelationshipType: 'NODE_METHOD_SIMILARITY',
                        writeProperty:'score'
                    }
                )
            """ % (name, degreeCutoff, simCutoff))

            session.run("call gds.graph.drop('%s')" % name)

    print("created", len(tables), "projections")
    return tables
    
get_method_similarity(10, 1.0)

created 10 projections


{134: [['170', '173', 1.0], ['173', '170', 1.0]],
 30: [['103', '104', 1.0], ['104', '103', 1.0]],
 64: [['165', '533', 1.0], ['533', '165', 1.0]],
 330: [['166', '535', 1.0], ['535', '166', 1.0]],
 332: [['167', '537', 1.0], ['537', '167', 1.0]],
 331: [['168', '536', 1.0], ['536', '168', 1.0]],
 329: [['169', '534', 1.0], ['534', '169', 1.0]],
 71: [['171', '172', 1.0], ['172', '171', 1.0]],
 117: [['250', '251', 1.0], ['251', '250', 1.0]],
 121: [['254', '257', 1.0], ['257', '254', 1.0]]}

Visualized: 
<img src="..\Data/visualizations/use_case_totally_similar.png" alt="Drawing" style="width: 300px;"/>

By this, we observe that only nodes from the same experiments are completely similar with methods.
We can now try to have degree cutoff at 9, to allow one thing to be not common:

In [91]:
get_method_similarity(9, 0.8)

created 27 projections


{134: [['170', '173', 1.0],
  ['170', '363', 0.5384615384615384],
  ['170', '362', 0.5],
  ['170', '449', 0.42857142857142855],
  ['173', '170', 1.0],
  ['173', '363', 0.5384615384615384],
  ['173', '362', 0.5],
  ['173', '449', 0.42857142857142855],
  ['362', '363', 0.9],
  ['362', '449', 0.5833333333333334],
  ['362', '173', 0.5],
  ['362', '170', 0.5],
  ['363', '362', 0.9],
  ['363', '173', 0.5384615384615384],
  ['363', '170', 0.5384615384615384],
  ['363', '449', 0.5],
  ['449', '362', 0.5833333333333334],
  ['449', '363', 0.5],
  ['449', '173', 0.42857142857142855],
  ['449', '170', 0.42857142857142855]],
 384: [['643', '644', 0.7647058823529411], ['644', '643', 0.7647058823529411]],
 265: [['338', '358', 1.0],
  ['338', '346', 1.0],
  ['338', '471', 0.8],
  ['338', '463', 0.8],
  ['338', '453', 0.8],
  ['338', '425', 0.8],
  ['338', '415', 0.8],
  ['338', '481', 0.7272727272727273],
  ['338', '493', 0.6363636363636364],
  ['346', '358', 1.0],
  ['346', '338', 1.0],
  ['346', '4

Visualization:

<img src="..\Data/visualizations/use_case_9of10_similar.png" alt="Drawing" style="width: 300px;"/>


We still observe that they come from the same experiment.

With this we have proven that we *cannot* say anything about the methods used in comparison with the results

In [63]:
with driver.session() as session:
    session.run("""
        MATCH ()-[r:NODE_ANALYSES_SIMILARITY]-()
        DETACH DELETE r
    """)
    session.run("""
        MATCH ()-[r:NODE_METHOD_SIMILARITY]-()
        DETACH DELETE r
    """)
    session.run("""
        MATCH ()-[r:USE_CASE]-()
        DETACH DELETE r
    """)
    session.run("""
        MATCH (n:AnalysisDataType)
        DETACH DELETE n
    """)