# IR in the Richard Cabot book

Case teaching in medicine; by Cabot, Richard C. (Richard Clarke), 1868-1939.

Accessing Solr via [pysolr]()

## Downloading, unziping and starting Solr in cluster mode

In [1]:
! wget  http://ftp.unicamp.br/pub/apache/lucene/solr/8.1.1/solr-8.1.1.zip -q
! unzip -qq solr-8.1.1.zip && rm solr-8.1.1.zip
! ./solr-8.1.1/bin/solr start -c


Port 8983 is already being used by another process (pid: 98)
Please choose a different port using the -p option.



## Creating a collection

In [2]:
! ./solr-8.1.1/bin/solr create -c cabot -p 8983

         To turn off: bin/solr config -c cabot -p 8983 -action set-user-property -property update.autoCreateFields -value false
Created collection 'cabot' with 1 shard(s), 1 replica(s) with config-set 'cabot'


## Connecting to the Solr server

In [3]:
import pysolr
solr_server_url = 'http://localhost:8983/solr/'
solr_collection = "cabot"
solr = pysolr.Solr(solr_server_url + solr_collection)

## Loading the CSV via Pandas

In [4]:
import pandas as pd
cabot_book = pd.read_csv("../../data/case-teaching-cabot/case-teaching-cabot.csv")
cabot_book

Unnamed: 0,case
0,"CASE 1\n\nA liquor dealer, 47 years old, is se..."
1,CASE 2\n\nA fireman of 26 w&s exercising engin...
2,"CASE 3\n\nA medical s,tudent of 25 has been tr..."
3,"CASE 4 \n\nA married woman, 43 years old, is s..."
4,CASE 5 \n\nA vigorous man of 62 comee of a gou...
5,"CASE 6 \n\nA child, 7 years of age, of healthy..."
6,"CASE 7 \n\nA married woman of 50, has had thre..."
7,"CASE 8 \n\nA coachman, 42 years old, of good f..."
8,"CASE 9 \n\nJ. B., male, aged 32 (occupation, c..."
9,CASE 10 \n\nThe patient is a contractor of 50....


## Transferring CSV to Solr

In [5]:
for index, row in cabot_book.iterrows():
    solr.add([{
                "id": index+1,
                "case": row['case']
              }], commit=True)

# Querying in Solr

The queries below follow the syntax described in: [Lucene Query Syntax](http://www.solrtutorial.com/solr-query-syntax.html).

## Searching CASE 53

In [6]:
results = solr.search('case:"CASE 53"')

print("Found {0} documents".format(len(results)))

for result in results:
    print(result['case'], "\n")

Found 1 documents
["CASE 53 \n\nSingle Iftdy, 57 years old, always more or less of a nervoua invalid, \nconsults a physician for palpitation and dyspncea on exertion. The \nmenopause occurred five years ago, and since then she has been getting \nvery stout and disinclined to exertion. She is thirsty and her skin is \ndry and perspires very little. Of late, the feet have been swelling and \nher face seems puffy all the time, not especially under the eyes. She \nis troubled a great deal with headaches, worse, at night, and her hair has \nbeen coming out of late. No sore throat, but the shin bones are tender \nand the tissues over them pit slightly on pressure. The bowels are very \ncostive, appetite capricious, sleep disturbed by headache. Her mem- \nory is very poor and she takes little interest in anything. \n\nPhysical Examination: Heart's area cannot be marked out on ac- \ncount of the great thickness of the fat layer. The apex is not seen \nor felt; best heard in sixth space, one in

## Documents about `arterio sclerosis`

In [7]:
results = solr.search('case:"arterio sclerosis"')

print("Found {0} documents".format(len(results)))

for result in results:
    print(result['case'], "\n")

Found 4 documents
['CASE 29 \n\nPatient a man 55 years old; rather fat; subject to frequent attacks \nof winter cough, with aathmatic tendency. For seven years the heart \nhad been noticeably weak and irregular. Puke SO; first sound valvu- \nlar. Apex beat art inch and a half directly below left nipple; no \nmurmurs. No previous rheumatism. Several years ago there was \nsudden and complete loss of memory, the same questions being re- \npeated as soon as answered. The expression was at the time rather \nvacant; the pupils were equal and responded to light; there was no \nmotor paralysis. The amnesia lasted all day, disappearing the follow- \ning morning. The pulse remained 50 for two days. The patient had \nbeen previously very anffimic, and had had much fatigue and anxiety, \nwith digestive disturbance. The urine always remained normal. In \nthe following years there were occasional attacks of transient numbness \nin the left arm and leg, and sometimes faint turns with pallor and \nirr

In [8]:
results = solr.search('case:embolism')

print("Found {0} documents".format(len(results)))

for result in results:
    print(result['case'], "\n")

Found 7 documents
["CASE 8 \n\nA coachman, 42 years old, of good family history, ia seen April 20. \nHealth has always been good except for a severe attack of penumonia \nthree years ago, which was followed by phlebitis in the left femoral vein. \nThe left leg has remained somewhat swollen, and has been tense and \nrather painful toward night. It has caused rather more discomfort \nthan usual during the past few days. Yesterday morning he got up \nfeeling as usual , but on reaching the house of bis employer felt nauseated \nand had some diarrhuea, which continued during the day. He felt \nfeverish and weak. Went to work again this momiag, but gave up \nafter half an hour owing to nausea and pain in the lower abdomen, and \nwent to bed. At eleven o'clock had a distinct chill. Was seen for the \nfirst time at 12.45 p.m. The patient was a stout man who looked acutely \nsick. The chest was negative. Owing to a thick fat layer, examina- \ntion of the abdomen was not altogether satisfactory;

## Boolean Operators

In [78]:
results = solr.search('case:"arterio sclerosis" AND case:embolism')

print("Found {0} documents".format(len(results)))

for result in results:
    print(result['case'], "\n")

Found 2 documents
['CASE 29 \n\nPatient a man 55 years old; rather fat; subject to frequent attacks \nof winter cough, with aathmatic tendency. For seven years the heart \nhad been noticeably weak and irregular. Puke SO; first sound valvu- \nlar. Apex beat art inch and a half directly below left nipple; no \nmurmurs. No previous rheumatism. Several years ago there was \nsudden and complete loss of memory, the same questions being re- \npeated as soon as answered. The expression was at the time rather \nvacant; the pupils were equal and responded to light; there was no \nmotor paralysis. The amnesia lasted all day, disappearing the follow- \ning morning. The pulse remained 50 for two days. The patient had \nbeen previously very anffimic, and had had much fatigue and anxiety, \nwith digestive disturbance. The urine always remained normal. In \nthe following years there were occasional attacks of transient numbness \nin the left arm and leg, and sometimes faint turns with pallor and \nirr

## Text Highlighting

In [77]:
from IPython.core.display import display, HTML

results = solr.search('case:"arterio sclerosis"', **{
        'hl': 'true',
        'hl.fragsize': 50,
        'hl.fl': 'case'
    })

print("Found {0} documents".format(len(results)))

print("=== Full result ===\n")

print(results.highlighting)

print("\n=== Formating ===")

for result in results:
    display(HTML("Case {0}: {1}<br>".format(result['id'], results.highlighting[result['id']]['case'][0])))

Found 4 documents
=== Full result ===

{'29': {'case': [', are results of cerebral <em>arterio</em>- \n<em>sclerosis</em> with']}, '66': {'case': [' the advance of the <em>arterio</em>-<em>sclerosis</em> lesions \nin']}, '78': {'case': [' general \n<em>arterio</em>-<em>sclerosis</em> with cardiac hypertrophy and']}, '59': {'case': [' for <em>arterio</em>-<em>sclerosis</em> and so for \nmjrocardial']}}

=== Formating ===


## Wildcard matching

In [11]:
results = solr.search('case:emboli*')

print("Found {0} documents".format(len(results)))

for result in results:
    print(result['case'], "\n")

Found 9 documents
["CASE 8 \n\nA coachman, 42 years old, of good family history, ia seen April 20. \nHealth has always been good except for a severe attack of penumonia \nthree years ago, which was followed by phlebitis in the left femoral vein. \nThe left leg has remained somewhat swollen, and has been tense and \nrather painful toward night. It has caused rather more discomfort \nthan usual during the past few days. Yesterday morning he got up \nfeeling as usual , but on reaching the house of bis employer felt nauseated \nand had some diarrhuea, which continued during the day. He felt \nfeverish and weak. Went to work again this momiag, but gave up \nafter half an hour owing to nausea and pain in the lower abdomen, and \nwent to bed. At eleven o'clock had a distinct chill. Was seen for the \nfirst time at 12.45 p.m. The patient was a stout man who looked acutely \nsick. The chest was negative. Owing to a thick fat layer, examina- \ntion of the abdomen was not altogether satisfactory;

## Proximity matching

Search for "heart attack" within 4 words from each other.

In [76]:
results = solr.search('case:"heart attack"~7', **{
        'hl': 'true',
        'hl.fragsize': 100,
        'hl.fl': 'case'
    })

print("Found {0} documents".format(len(results)))

for result in results:
    display(HTML("Case {0}: {1}<br>".format(result['id'], results.highlighting[result['id']]['case'][0])))

Found 1 documents
