# SumMed Demo - Multi-language suppport (Portuguese -> German + Russian)

This notebook demonstrate a scenario where the patients preferred language is either **German** or **Russian**, while the document is available in **Portuguese**

<img src="summed_logo.png" width=600/>

### 1. The source material
Two portuguese websites completely dedicated to hereditary cancer, all the webpages of these websites are dedicated to HC 

- http://www.cancronafamilia.org
-  http://www.cancrohereditario.pt/pt/

A Therapeutic Guidance Guide for people affected by hereditary cancer developed by the Portuguese Institute of Oncology 
- http://www.ipoporto.pt/dev/wp-content/uploads/2018/12/Normas-de-orienta%C3%A7%C3%A3o-Oncogen%C3%A9tica.pdf 

Webpage from Evita's website that includes support materials such as leaflets about oncofertility, lymphedema, BRCA mutations, advice on how women can take proper care of themselves while undergoing radiotherapy or after having a mastectomy 

- https://www.evitacancro.org/cancro-hereditario/apoio-internacional/ 



Websites of some portuguese medical societies that may be relevant:
- Portuguese Society of Senology - https://www.spsenologia.pt/ 
- Portuguese Society of Human Genetics - http://spgh.net/ 
- Portuguese Society of Gynecology - https://spginecologia.pt/ 
- Portuguese Association of Cancer Research - http://www.aspic.pt/ 


In [36]:
from dotenv import load_dotenv
from summed.data import PlatformConfig, AnalysisConfig, FileInfo, DocumentSource, Document, SearchResult, TermExplanation
from summed.analysis.configurations import DETECT_SENTENCES, CREATE_SUMMARY, CREATE_ABSTRACTIVE_SUMMARY, DETECT_ENTITIES, DETECT_HEALTH_ENTITIES, GLOSSARY_LOOKUP, TRUSTED_SEARCH, CALCULATE_TRUST_SCORE, PREPROCESS_TEXT, PROFILE_BASIC, PROFILE_FULL, TRANSLATE_TEXT 

from summed.summed_api import SumMedAPI


load_dotenv("../.env")
load_dotenv("../.env.testing")

import logging
logger = logging.getLogger()
logger.setLevel(logging.ERROR)

SumMed = SumMedAPI(name="Test")

# http://www.ipoporto.pt/dev/wp-content/uploads/2018/12/Normas-de-orienta%C3%A7%C3%A3o-Oncogen%C3%A9tica.pdf



### 2. Load the document
Upload a PDF document into our __Space__, and then extract the initial __Document__.
We get some initial metadata and the extracted text, ready for further processing.


In [38]:
# The filename of the  document we want to analyse
from pathlib import Path
file_name = Path ("./unittests/data/normas-de-orientação-oncogenética.pdf").resolve()

# Load it into our "space"
source_file = SumMed.upload (file_name)



In [41]:
# and then extract it as a new document 
document = SumMed.extract (source_file)

print (f"Extracted text from file '{document.source_file.filename}' with title: '{document.title}'" )

Extracted text from file 'normas_de_orienta_C3_A7_C3_A3o_oncogen_C3_A9tica.pdf' with title: 'None'


### 3. Translate into English
For now, the best we can do is translating the text into english first. This actually requires two processing steps on the _Document__:
1. PREPROCESS the text: this will do language independent cleanup (e.g. normalizing whitespace or hypenation)
2. DETECT_SENTENCES: find sentence boundaries in original language (pt). 
3. TRANSLATE to english
4. DETECT_SENTENCES again (this time specialized english sentencizer)
5. CREATE_SUMMARY



In [42]:
# Detect the sentence boundaries in the document
#document.language = "pt"
document = SumMed.analyze (document, PREPROCESS_TEXT)
print (f"Length of original text : {len(document.text)}")
print (f"text starts with: '{document.text[:200]}'")
print (f"Detected {len(document.sentences)} sentences in the document (first 3): {document.sentences[:3]}")


document = SumMed.analyze (document, TRANSLATE_TEXT["en"])
print (f"Detected {len(document.sentences)} sentences in the document (first 3): {document.sentences[:3]}")



Length of original text : 31628
text starts with: 'IPOPORTO INSTITUTO PORTUGUÊS DE ONCOLOGIA DO PORTO FRANCISCO GENTIL, EPE Guia de Orientação Terapêutica CANCRO HEREDITÁRIO Para cuidar de si! Ficha Técnica: Edição: IPO - Porto Propriedade: IPO - Port'
Detected 9 sentences in the document (first 3): ['ções de rastreio e/ou redução do risco.', 'cia de cancro hereditário não está descartada.', 'ta a história familiar (incluindo os indivíduos saudáveis).']
Detected 9 sentences in the document (first 3): ['screening and/or risk reduction.', 'hereditary cancer is not ruled out.', 'family history (including healthy individuals).']


In [43]:
document = SumMed.analyze (document, DETECT_SENTENCES)

print (f"Detected {len(document.sentences)} sentences in the document (first 3): {document.sentences[:3]}")

Detected 50 sentences in the document (first 3): ['Genetic counselling helps people make informed decisions about whether the tests available are useful, both for themselves and for their families.', 'The two most common hereditary predisposition syndromes for cancer are hereditary breast and ovarian cancer and Lynch syndrome (hereditary colorectal cancer without polyposis, HNPCC).', '2- Genetic tests Genetic tests of hereditary cancer are not "just another blood test" nor are they suitable for the general population.']


### 

In [44]:
document.text

'IPOPORTO PORTUGUESE INSTITUTE OF ONCOLOGY OF PORTO FRANCISCO GENTIL, EPE Guidance Therapeutic Cancer HEREDITARY To take care of yourself! Fact Sheet: Edition: IPO - Port Property: IPO - Porto Texts: IPO - Porto Photography: Median - Global Communication and IPO - Port Design and Production: Median - Global Communication 2 HEREDITARY CANCER Hereditary Cancer 1- Genetic Counseling Genetic counseling is recommended to individuals with a history of cancer, personal and family, who suggest hereditary syndrome. The aim of genetic counseling is to enable individuals to better understand hereditary cancer, understand their own risks of developing cancer, and know the various screening and/or risk reduction options. Genetic counselling helps people make informed decisions about whether the tests available are useful, both for themselves and for their families. Genetic counseling includes: detailed review of family history, paying special attention to those who have already suffered from cancer

### Generate summaries
Generate **extractive** (= top ranked sentences) and **abstractive** (= paraphrased by text generator model) summaries

In [45]:
# Create a summary of the document
document = SumMed.analyze (document, CREATE_SUMMARY)

print (f"Here are the {len(document.summary)} most important sentences, which we think contain the essence of the document: ")
document.summary

Here are the 5 most important sentences, which we think contain the essence of the document: 


['The aim of genetic counseling is to enable individuals to better understand hereditary cancer, understand their own risks of developing cancer, and know the various screening and/or risk reduction options.',
 'Genetic counseling includes: detailed review of family history, paying special attention to those who have already suffered from cancer; assistance with the collection of relevant medical records in order to provide accurate risk assessment; explanation of the differences between sporadic cancers (they occur by chance in the population, so all are at the same risk) and hereditary cancers (which appear in certain families and may be associated with a mutation inherited from a specific gene); interpretation of the pattern or patterns of cancer in family history – some people may realize that the risk of getting cancer is lower than they thought and others that the risk is higher; discussion of the possibility of performing genetic tests and, if indicated, who is the best index ca

In [46]:
document.summary


['The aim of genetic counseling is to enable individuals to better understand hereditary cancer, understand their own risks of developing cancer, and know the various screening and/or risk reduction options.',
 'Genetic counseling includes: detailed review of family history, paying special attention to those who have already suffered from cancer; assistance with the collection of relevant medical records in order to provide accurate risk assessment; explanation of the differences between sporadic cancers (they occur by chance in the population, so all are at the same risk) and hereditary cancers (which appear in certain families and may be associated with a mutation inherited from a specific gene); interpretation of the pattern or patterns of cancer in family history – some people may realize that the risk of getting cancer is lower than they thought and others that the risk is higher; discussion of the possibility of performing genetic tests and, if indicated, who is the best index ca

In [32]:
# Create n "abstractive" (paraphrasing) summary of the document
document = SumMed.analyze (document, CREATE_ABSTRACTIVE_SUMMARY)

print (f"Here's how our A.I. would paraphrase the key ideas in the document: ")
document.abstractive_summary

Here's how our A.I. would paraphrase the key ideas in the document: 


['1. The aim of genetic counseling is to help people understand their risks for developing cancer, and know the various screening and/or risk reduction options.\n2. Genetic counseling includes: detailed review of family history, paying special attention to those who have already suffered from cancer; assistance with the collection of relevant medical records in order to provide accurate risk assessment; explanation of the differences between sporadic cancers (they occur by chance in the population, so all are at the same risk) and hereditary cancers (which appear in certain families and may be associated with a mutation inherited from a specific gene); interpretation of the pattern or patterns of cancer in family history – some people may realize that the risk of getting cancer is lower than they thought and others that the risk is higher; discussion of the possibility of performing genetic tests and, if indicated, who is the best index case of the family; review of the procedure of a 

### Find medical entities within the text


In [33]:
# Detect the healt-related named entities in the document
document = SumMed.analyze (document,DETECT_HEALTH_ENTITIES)


print (f"Here's a list of {len(document.health_entities)} different medical 'things' that are mentioned in the text, orderd by frequency: ")
[f"{e.label}: {e.text} ({e.count})" for e in document.health_entities]


Here's a list of 337 different medical 'things' that are mentioned in the text, orderd by frequency: 


['FamilyRelation: Family (67)',
 'Diagnosis: Malignant Neoplasms (50)',
 'Diagnosis: Malignant neoplasm of breast (22)',
 'Diagnosis: Colorectal Carcinoma (20)',
 'Diagnosis: Lynch Syndrome (19)',
 'GeneOrProtein: BRCA1 gene (16)',
 'GeneOrProtein: BRCA2 gene (15)',
 'GeneOrProtein: Genes (15)',
 'ExaminationName: Genetic screening method (14)',
 'Gender: Woman (11)',
 'FamilyRelation: Family member (11)',
 'Diagnosis: AMYLOIDOSIS, HEREDITARY, TRANSTHYRETIN-RELATED (9)',
 'Diagnosis: Malignant neoplasm of ovary (9)',
 'Diagnosis: Hereditary Breast and Ovarian Cancer Syndrome (8)',
 'Diagnosis: melanoma (7)',
 'Diagnosis: Pheochromocytoma (7)',
 'Gender: Male population group (7)',
 'Diagnosis: Paraganglioma (6)',
 'ExaminationName: Screening procedure (6)',
 'Age: 40 (6)',
 'Age: 50 (6)',
 'ExaminationName: Carrier testing (6)',
 'Diagnosis: Malignant neoplasm of pancreas (5)',
 'Diagnosis: Sarcoma (5)',
 'Diagnosis: Medullary carcinoma of thyroid (5)',
 'Diagnosis: Malignant neoplasm 

### Find trusted information
Our trusted web search will look for highly relevant online articles that are closely rfelated to our document and the key topics in it. Only known, trusted websites (such as from renowned medical institutes) are considered.

In [34]:
# Lookup web articles from trusted sources that are highly related to the document 
document = SumMed.analyze (document, TRUSTED_SEARCH)

[f"{x.name} ({x.url}) " for x in document.search_results]

['Genetic counseling - Wikipedia (https://en.wikipedia.org/wiki/Genetic_counseling) ',
 'Genetic Counseling | CDC (https://www.cdc.gov/genomics/gtesting/genetic_counseling.htm) ',
 'Family communication about genetic risk: the little that is known (https://pubmed.ncbi.nlm.nih.gov/15475667/) ',
 'Family history of breast cancer: what do women understand and recall ... (https://pubmed.ncbi.nlm.nih.gov/9733031/) ',
 'What are the benefits of genetic testing?: MedlinePlus Genetics (https://medlineplus.gov/genetics/understanding/testing/benefits/) ',
 'Retinoblastoma: A Major Review - PubMed (https://pubmed.ncbi.nlm.nih.gov/34226484/) ',
 'Eugenics - Wikipedia (https://en.wikipedia.org/wiki/Eugenics) ',
 'Psychosocial Support Options for People with Cancer (https://www.cancer.org/treatment/survivorship-during-and-after-treatment/coping/understanding-psychosocial-support-services.html) ']

### Cleanup

In [35]:
SumMed.space.delete_all_files ()

True