<a href="https://colab.research.google.com/github/sheetal-kumar/repos/blob/main/Medical_Report_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import spacy
import textwrap
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from heapq import nlargest
punctuation += '\n' 
stopwords = list(STOP_WORDS)

reduction_rate = 0.1  #defines how small the output summary should be compared with the input

text = """I saw ABC back in Neuro-Oncology Clinic today. He comes in for an urgent visit because of increasing questions about what to do next for his anaplastic astrocytoma.
Within the last several days, he has seen you in clinic and once again discussed whether or not to undergo radiation for his left temporal lesion. The patient has clearly been extremely ambivalent about this therapy for reasons that are not immediately apparent. It is clear that his MRI is progressing and that it seems unlikely at this time that anything other than radiation would be particularly effective. Despite repeatedly emphasizing this; however, the patient still is worried about potential long-term side effects from treatment that frankly seem unwarranted at this particular time.
After seeing you in clinic, he and his friend again wanted to discuss possible changes in the chemotherapy regimen. They came in with a list of eight possible agents that they would like to be administered within the next two weeks. They then wanted another MRI to be performed and they were hoping that with the use of this type of approach, they might be able to induce another remission from which he can once again be spared radiation.
From my view, I noticed a man whose language has deteriorated in the week since I last saw him. This is very worrisome. Today, for the first time, I felt that there was a definite right facial droop as well. Therefore, there is no doubt that he is becoming symptomatic from his growing tumor. It suggests that he is approaching the end of his compliance curve and that the things may rapidly deteriorate in the near future.
Emphasizing this once again, in addition, to recommending steroids I once again tried to convince him to undergo radiation. Despite an hour, this again amazingly was not possible. It is not that he does not want treatment, however. Because I told him that I did not feel it was ethical to just put him on the radical regimen that him and his friend devised, we compromised and elected to go back to Temodar in a low dose daily type regimen. We would plan on giving 75 mg/sq m everyday for 21 days out of 28 days. In addition, we will stop thalidomide 100 mg/day. If he tolerates this for one week, we then agree that we would institute another one of the medications that he listed for us. At this stage, we are thinking of using Accutane at that point.
While I am very uncomfortable with this type of approach, I think as long as he is going to be monitored closely that we may be able to get away with this for at least a reasonable interval. In the spirit of compromise, he again consented to be evaluated by radiation and this time, seemed more resigned to the fact that it was going to happen sooner than later. I will look at this as a positive sign because I think radiation is the one therapy from which he can get a reasonable response in the long term.
I will keep you apprised of followups. If you have any questions or if I could be of any further assistance, feel free to contact me."""



In [None]:
nlp_pl = spacy.load('en_core_web_sm')     #process original text according with the Spacy nlp pipeline for english
document = nlp_pl(text)                   #doc object

tokens = [token.text for token in document] #tokenized text

word_frequencies = {}
for word in document:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] += 1

max_frequency = max(word_frequencies.values())
print(max_frequency)

for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency

print(word_frequencies)

6
{'saw': 0.3333333333333333, 'ABC': 0.16666666666666666, 'Neuro': 0.16666666666666666, 'Oncology': 0.16666666666666666, 'Clinic': 0.16666666666666666, 'today': 0.16666666666666666, 'comes': 0.16666666666666666, 'urgent': 0.16666666666666666, 'visit': 0.16666666666666666, 'increasing': 0.16666666666666666, 'questions': 0.3333333333333333, 'anaplastic': 0.16666666666666666, 'astrocytoma': 0.16666666666666666, 'days': 0.5, 'seen': 0.16666666666666666, 'clinic': 0.3333333333333333, 'discussed': 0.16666666666666666, 'undergo': 0.3333333333333333, 'radiation': 1.0, 'left': 0.16666666666666666, 'temporal': 0.16666666666666666, 'lesion': 0.16666666666666666, 'patient': 0.3333333333333333, 'clearly': 0.16666666666666666, 'extremely': 0.16666666666666666, 'ambivalent': 0.16666666666666666, 'therapy': 0.3333333333333333, 'reasons': 0.16666666666666666, 'immediately': 0.16666666666666666, 'apparent': 0.16666666666666666, 'clear': 0.16666666666666666, 'MRI': 0.3333333333333333, 'progressing': 0.16

In [None]:
sentence_tokens = [sent for sent in document.sents]

def get_sentence_scores(sentence_tok, len_norm=True):
  sentence_scores = {}
  for sent in sentence_tok:
      word_count = 0
      for word in sent:
          if word.text.lower() in word_frequencies.keys():
              word_count += 1
              if sent not in sentence_scores.keys():
                  sentence_scores[sent] = word_frequencies[word.text.lower()]
              else:
                  sentence_scores[sent] += word_frequencies[word.text.lower()]
      if len_norm:
        sentence_scores[sent] = sentence_scores[sent]/word_count
  return sentence_scores
                
sentence_scores = get_sentence_scores(sentence_tokens,len_norm=False)        #sentence scoring without lenght normalization
sentence_scores_rel = get_sentence_scores(sentence_tokens,len_norm=True)     #sentence scoring with length normalization

In [None]:
def get_summary(sentence_sc, rate):
  summary_length = int(len(sentence_sc)*rate)
  summary = nlargest(summary_length, sentence_sc, key = sentence_sc.get)
  final_summary = [word.text for word in summary]
  summary = ' '.join(final_summary)
  return summary

print("- NON_REL: "+ get_summary(sentence_scores, reduction_rate))
print("- REL: "+ get_summary(sentence_scores_rel, reduction_rate))

- NON_REL: Because I told him that I did not feel it was ethical to just put him on the radical regimen that him and his friend devised, we compromised and elected to go back to Temodar in a low dose daily type regimen. I will look at this as a positive sign because I think radiation is the one therapy from which he can get a reasonable response in the long term.

- REL: It is clear that his MRI is progressing and that it seems unlikely at this time that anything other than radiation would be particularly effective. I will look at this as a positive sign because I think radiation is the one therapy from which he can get a reasonable response in the long term.



In [None]:
from summarizer import Summarizer, sentence_tokenizer


model = Summarizer

summary_length = int(len(sentence_tokens)*(reduction_rate))
result = model()

# for i in result.get_summary(text, "Title"):
    # print(i['sentence'])

# print(result.get_summary(text, "Title"))
print(result.sort_score(result.get_summary(text, "Title")))

[{'total_score': 0.1675, 'sentence': 'He comes in for an urgent visit because of increasing questions about what to do next for his anaplastic astrocytoma.', 'order': 1}, {'total_score': 0.16375, 'sentence': 'The patient has clearly been extremely ambivalent about this therapy for reasons that are not immediately apparent.', 'order': 3}, {'total_score': 0.1569631654214771, 'sentence': 'After seeing you in clinic, he and his friend again wanted to discuss possible changes in the chemotherapy regimen.', 'order': 6}, {'total_score': 0.15625, 'sentence': 'If you have any questions or if I could be of any further assistance, feel free to contact me.', 'order': 26}, {'total_score': 0.15125, 'sentence': 'It is clear that his MRI is progressing and that it seems unlikely at this time that anything other than radiation would be particularly effective.', 'order': 4}, {'total_score': 0.15125, 'sentence': 'Despite repeatedly emphasizing this; however, the patient still is worried about potential l

In [None]:
!pip install summarizer

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting summarizer
  Downloading summarizer-0.0.7.tar.gz (280 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.1/280.1 KB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: summarizer
  Building wheel for summarizer (setup.py) ... [?25l[?25hdone
  Created wheel for summarizer: filename=summarizer-0.0.7-py2.py3-none-any.whl size=284224 sha256=b8e4e69c4a73ad9c7e8b986f3047a3a7d2679b4848dbed1a4f5ee14fe4e23f20
  Stored in directory: /root/.cache/pip/wheels/9e/65/2d/389857cbe6fb8c98184cce938481b283faaf777023832e0aea
Successfully built summarizer
Installing collected packages: summarizer
Successfully installed summarizer-0.0.7


In [None]:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
for ent in doc.ents:
 print(ent.text, ent.start_char, ent.end_char, ent.label_)

ABC 6 9 ORG
Neuro-Oncology Clinic 18 39 ORG
today 40 45 DATE
astrocytoma 152 163 GPE
the last several days 172 193 DATE
eight 904 909 CARDINAL
the next two weeks 973 991 DATE
the week 1265 1273 DATE
Today 1320 1325 DATE
first 1335 1340 ORDINAL
an hour 1756 1763 TIME
Temodar 2023 2030 PRODUCT
daily 2045 2050 DATE
75 2089 2091 CARDINAL
21 days 2113 2120 DATE
28 days 2128 2135 DATE
100 2175 2178 CARDINAL
one week 2212 2220 DATE
Accutane 2354 2362 PERSON
one 2813 2816 CARDINAL
