# SumMed Demo (Python Module) 

This is a notebook to give a short overview of how to use the SumMed library. It's showing key features of SumMed in a Python notebook.

<img src="summed_logo.png" width=600/>

### 1. Prepare environment
First we start here with some technical preparation

In [1]:
from dotenv import load_dotenv
from summed.data import PlatformConfig, AnalysisConfig, FileInfo, DocumentSource, Document, SearchResult, TermExplanation
from summed.analysis.configurations import DETECT_SENTENCES, CREATE_SUMMARY, CREATE_ABSTRACTIVE_SUMMARY, DETECT_ENTITIES, DETECT_HEALTH_ENTITIES, GLOSSARY_LOOKUP, TRUSTED_SEARCH, CALCULATE_TRUST_SCORE 

from summed.summed_api import SumMedAPI


load_dotenv("../.env")
load_dotenv("../.env.testing")

import logging
logger = logging.getLogger()
logger.setLevel(logging.ERROR)

SumMed = SumMedAPI(name="Test")

# https://www.breastcancer.org/research-news/risk-reducing-effects-of-arimidex-last-years


### 2. Load a medical web article
We always need to start with uploading a file into our "Space". This way we will have a single source of truth for the raw data that we can always refer back to. The idea is that users have their own spaces for all their documents, which will simplify compliance and data privacy.
Once the have the source file stored into our space, we can extract the text and metadata from it into a __Document__

In [2]:
# The URL to the document we want to analyse
url = "https://www.californiaprotons.com/breast-cancer/prevention-causes-risk-factors/"

# Load it into our "space"
source_file = SumMed.upload (url)

# and then extract it as a new document 
document = SumMed.extract (source_file)

print (f"Extracted text from file '{document.source_file.filename}' with title: '{document.title}'" )

Extracted text from file 'prevention_causes_risk_factors.html' with title: 'Prevent Breast Cancer - California Protons'


### Detect sentences
One of the first steps for analysis is to identify correct sentence boundaries. 
Depending on the source, this is almost never "perfect", but should catch the most important sentences. that we can then use for further analysis.

In [3]:
# Detect the sentence boundaries in the document
document = SumMed.analyze (document, DETECT_SENTENCES)

print (f"Detected {len(document.sentences)} sentences in the document: ")
document.sentences

Detected 54 sentences in the document: 


["SEE HOW WE'RE PROVIDING SAFE IN-PERSON CARE AND TELEMEDICINE APPOINTMENTS\nLearn More\nTelemedicine Options Now Available\nSearch Patient Portal 858.283.4771\nConditions Treated\nBreast\nProstate\nPediatric\nLung\nBrain & Spine\nBladder\nGastrointestinal\nStomach\nColon\nRectal\nAnal\nPancreatic & Bile\xa0Duct\nEsophageal\nLiver\nHead & Neck\nMouth\nThroat\nTongue\nNasal\nThyroid\nLymphoma\nSarcoma\nTesticular\nRecurrent & Secondary\nProton Therapy\nPatient Support\nPatient Support Overview\nPlan Visit\nCost & Coverage\nTelemedicine\nNutrition Services\nFAQs\nWhy Us\nWhy Us\nAffiliations\nOur Physicians\nNews\nCovid-19 Update\nPatient Testimonials\nVirtual Tour\nRequest Appointment\nSEE HOW WE'RE PROVIDING SAFE IN-PERSON CARE AND TELEMEDICINE APPOINTMENTS\nLearn More\nTelemedicine Options Now Available\nSearch\nConditions Treated\nBreast\nProstate\nPediatric\nLung\nBrain & Spine\nBladder\nGastrointestinal\nStomach\nColon\nRectal\nAnal\nPancreatic & Bile\xa0Duct\nEsophageal\nLiver\nHe

### 

### Generate a summary
First step is to create an so called ___extractive summary___. This is basically a ranked list of the most important sentences from the document. Think of it like going with a text marker over the document, and marking the key sentences.

In [5]:
# Create a summary of the document
document = SumMed.analyze (document, CREATE_SUMMARY)

print (f"Here are the {len(document.summary)} most important sentences, which we think contain the essence of the document: ")
document.summary

Here are the 5 most important sentences, which we think contain the essence of the document: 


['Breast cancer is the most common female cancer worldwide and about 1 in 8 women in the United States will be diagnosed with it sometime in their lifetimes.',
 'However, breast cancer can be highly curable, especially when caught early.',
 'Caucasian women are slightly more likely to develop breast cancer than those of African American, Hispanic and Asian descent.',
 '\nChildbirth and menstruation cycles: If you haven’t had a full-term pregnancy or your first child before the age of 30, you might have a higher risk of getting breast cancer.',
 'The removal of your ovaries lowers the amount of estrogen in the body, which helps prevent breast cancers that require estrogen to grow.']

### Generate an "abstractive" summary
The next step can be to create an __abstractive summary__. This is an A.I. generated summary of the original text, e.g. the systems writes a complete new text (not in the document) about what it THINKS is the essence of the document. This is not always perfect, and we will need to optimize it more for medical texts

In [6]:
# Create n "abstractive" (paraphrasing) summary of the document
document = SumMed.analyze (document, CREATE_ABSTRACTIVE_SUMMARY)

print (f"Here's how our A.I. would paraphrase the key ideas in the document: ")
document.abstractive_summary

Here's how our A.I. would paraphrase the key ideas in the document: 


["Hormone replacement therapy: If you took hormone replacement therapy (HRT) for menopause symptoms for more than five years, you might have an increased risk of breast cancer.\n\nThere are three key points in this medical text. The first is that childbirth and menstruation cycles can affect a woman's risk of developing breast cancer. The second is that hormone replacement therapy can also affect a woman's risk of developing breast cancer. The third is that Caucasian women are slightly more likely to develop breast cancer than those of African American, Hispanic and Asian descent."]

### Find medical entities within the text
___Named Entity Recognition___ (NER) is a way to identify named "things" and their categories.
SumMed currently support a __"basic"__ version, which only detects very common entities (not very useful in our context), 
and __"health"__ entity recognition, which is using Microsoft Azure Text Analytics for Health - a "paid" service)

In [7]:
# Detect the healt-related named entities in the document
document = SumMed.analyze (document,DETECT_HEALTH_ENTITIES)


print (f"Here's a list of {len(document.health_entities)} different medical 'things' that are mentioned in the text, orderd by frequency: ")
[f"{e.label}: {e.text} ({e.count})" for e in document.health_entities]


Here's a list of 150 different medical 'things' that are mentioned in the text, orderd by frequency: 


['Diagnosis: Malignant neoplasm of breast (22)',
 'Diagnosis: Malignant Neoplasms (18)',
 'Gender: Woman (12)',
 'TreatmentName: Operative Surgical Procedures (5)',
 'BodyStructure: Breast (4)',
 'FamilyRelation: Family (3)',
 'GeneOrProtein: BRCA2 gene (3)',
 'GeneOrProtein: BRCA1 gene (3)',
 'TreatmentName: Proton Therapy (3)',
 'GeneOrProtein: PGR gene (2)',
 'GeneOrProtein: ESR1 gene (2)',
 'MedicationClass: Hormone Therapy (2)',
 'BodyStructure: breast cells (2)',
 'TreatmentName: Prophylactic treatment (2)',
 'Diagnosis: Inflammation (2)',
 'FamilyRelation: Offspring (2)',
 'FamilyRelation: First Degree Relative (2)',
 'Age: 55 (2)',
 'Diagnosis: COVID19 (disease) (2)',
 'HealthcareProfession: Physicians (2)',
 'HealthcareProfession: Nutrition Services (2)',
 'Diagnosis: Testicular dysfunction (2)',
 'Diagnosis: Sarcoma (2)',
 'Diagnosis: Lymphoma (2)',
 'BodyStructure: Thyroid Gland (2)',
 'BodyStructure: Nose (2)',
 'BodyStructure: Tongue (2)',
 'BodyStructure: Pharyngeal struc

### Find trusted information
Our trusted web search will look for highly relevant online articles that are closely rfelated to our document and the key topics in it. Only known, trusted websites (such as from renowned medical institutes) are considered.

In [8]:
# Lookup web articles from trusted sources that are highly related to the document 
document = SumMed.analyze (document, TRUSTED_SEARCH)

[f"{x.name} ({x.url}) " for x in document.search_results]

['Hormone Replacement Therapy - What You Need to Know (https://www.drugs.com/cg/hormone-replacement-therapy.html) ',
 'Hormone Replacement Therapy | HRT | Menopause | MedlinePlus (https://medlineplus.gov/hormonereplacementtherapy.html) ',
 'Hormone therapy: Is it right for you? - Mayo Clinic (https://www.mayoclinic.org/diseases-conditions/menopause/in-depth/hormone-therapy/ART-20046372) ',
 'Hormone Replacement Therapy for Menopause - Healthline (https://www.healthline.com/health/menopause/hormone-replacement-therapy-menopause) ',
 'Hormone Replacement Therapy - WebMD (https://www.webmd.com/menopause/features/hormone-replacement-therapy) ',
 'Hormone replacement therapy after breast cancer: Yes, No or maybe? (https://pubmed.ncbi.nlm.nih.gov/33508379/) ',
 'Try This: 36 Alternatives to Hormone Replacement Therapy (HRT) (https://www.healthline.com/health/menopause/alternatives-to-hrt) ']

### Build a glossary of medical terms
We can add unknown terms to the documents "glossary" section, and let SumMed find an definition / explanation from an external medical terms dictionary

In [9]:
# We add a few terms to the glossary, and then SumMed will look up and fill in their explanations
document.glossary = {}
document.glossary["cancer"] =  TermExplanation(term="cancer")
document.glossary["Hemorrhage"] = TermExplanation(term="Hemorrhage") 
document.glossary["Erythema"] = TermExplanation(term="Erythema")
 
document = SumMed.analyze (document, GLOSSARY_LOOKUP)

print (f"Here's aglossary for selected medical terms in the document: ")
[f"{k}:   {v.explanation}" for k,v in (document.glossary).items()]

Here's aglossary for selected medical terms in the document: 


['cancer:   not yet implemented',
 'Hemorrhage:   not yet implemented',
 'Erythema:   not yet implemented']

### Cleanup

In [10]:
SumMed.space.delete_all_files ()

True