```   ├── README.md 
    ├── run.sh
    ├── src
    │   └── pharmacy-counting.py
    ├── input
    │   └── itcont.txt
    ├── output
    |   └── top_cost_drug.txt
    ├── insight_testsuite
        └── run_tests.sh
        └── tests
            └── test_1
            |   ├── input
            |   │   └── itcont.txt
            |   |__ output
            |   │   └── top_cost_drug.txt
            ├── your-own-test_1
                ├── input
                │   └── your-own-input-for-itcont.txt
                |── output
                    └── top_cost_drug.txt
```

Imagine you are a data engineer working for an online pharmacy. You are asked to generate **a list of all drugs**, the **total number of UNIQUE individuals** who prescribed the medication, and the **total drug cost**, which must be listed in **descending** order based on the total drug cost and if there is a tie, **drug name in ascending order**. 

Your program needs to create the output file, `top_cost_drug.txt`, that contains comma (`,`) separated fields in each line.

Each line of this file should contain these fields:
* drug_name: the exact drug name as shown in the input dataset
* num_prescriber: the number of unique prescribers who prescribed the drug. For the purposes of this challenge, a prescriber is considered the same person if two lines share the same prescriber first and last names
* total_cost: total cost of the drug across all prescribers

For example

If your input data, **`itcont.txt`**, is
```
id,prescriber_last_name,prescriber_first_name,drug_name,drug_cost
1000000001,Smith,James,AMBIEN,100
1000000002,Garcia,Maria,AMBIEN,200
1000000003,Johnson,James,CHLORPROMAZINE,1000
1000000004,Rodriguez,Maria,CHLORPROMAZINE,2000
1000000005,Smith,David,BENZTROPINE MESYLATE,1500
```

then your output file, **`top_cost_drug.txt`**, would contain the following lines
```
drug_name,num_prescriber,total_cost
CHLORPROMAZINE,2,3000
BENZTROPINE MESYLATE,1,1500
AMBIEN,2,300
```

These files are provided in the `insight_testsuite/tests/test_1/input` and `insight_testsuite/tests/test_1/output` folders, respectively.



In [None]:
import os 
path = !pwd
input_path = os.path.join(path, "itcont.txt")
output_path = os.path.join(path, "top_cost_drug.txt")

In [28]:
def pharm_count(input): 
    
    #process the txt data to proper format
    with open(input,'r') as f: 
        data = f.readlines()
    
    #ignore the header
    data = data[1:]
    
    #count unique patients 
    dict_name = process_name(data)
    
    #count cost 
    dict_cost = process_cost(data)
    
    #merge two dictionary. drug_name: [patient, cost]
    dict_name_cost = {}
    for k, v in dict_name.items(): 
        dict_name_cost.setdefault(k, [v]).append(dict_cost[k])
    
    #sort the merged dictionary by total cost and name 
    sorted_by_value = sorted(dict_name_cost.items(), key=lambda x: x[1][1], reverse = True)
    
    #generate an output file 
    output = open(output_path, "w")
    output.write('drug_name,num_prescriber,total_cost' + '\n')
    for k, v in sorted_by_value.items():
        output.write(str(k) + ','+ str(v[0]) + ',' + str(v[1]) + '\n')
    output.close()

In [29]:
def process_name(data): 
    '''
    read in text file linewise
    capture the drug name
    create a dictionary that counts the appearace of 
    
    example:
    1000000001,Smith,James,AMBIEN,100
    1000000002,Garcia,Maria,AMBIEN,200
    1000000003,Garcia,Maria,AMBIEN,100
    as
    name_list={AMBIEN: [James-Smith, Maria-Garcia, Maria-Garcia]} 
    name_count={AMBIEN: 2} 
    
    '''  
    drug = {}
    
    for line in data: 
        id, last, first, drug_name, cost = line.split(',')
        patient_name = first + "-" + last
        drug.setdefault(drug_name, []).append(patient_name)  
        
    for k, v in drug.items(): 
        drug[k] = len(set(v))
    
    return drug

In [30]:
def process_cost(data): 
    '''
    read in text file linewise
    capture the drug name
    create a dictionary that counts the appearace of 
    
    example:
    1000000001,Smith,James,AMBIEN,100
    1000000002,Garcia,Maria,AMBIEN,200
    1000000003,Garcia,Maria,AMBIEN,100
    as
    drug = {AMBIEN: 400}
    
    '''  
    drug = {}
    for line in data: 
        *_, drug_name, cost = line.split(',')
        cost_num = int(cost)
        drug[drug_name] = drug.get(drug_name, 0) + cost_num
  
    return drug

In [27]:
# simple test for 
#test_data = ['1000000001,Smith,James,AMBIEN,100', 
#             '1000000002,Garcia,Maria,AMBIEN,200',
#             '1000000003,Garcia,Maria,AMBIEN,100']
#process_cost(test_data)

#test_data = ['1000000001,Smith,James,AMBIEN,100', 
#             '1000000002,Garcia,Maria,AMBIEN,200',
#             '1000000003,Garcia,Maria,AMBIEN,100']
#process_name(test_data)

#dict_name= {'A': 2, 'B':1, 'C':3, 'D': 10}
#dict_cost= {'A': 20, 'B':10, 'C': 20, 'D':10}
#dict_name_cost = {}
#for k, v in dict_name.items(): 
#    dict_name_cost.setdefault(k, [v]).append(dict_cost[k])
#    
#sorted_by_value = sorted(dict_name_cost.items(), key=lambda x: x[1][1], reverse = True)
#sorted_by_value
#
#output = open(output_path, "w")
#    output.write('drug_name,num_prescriber,total_cost' + '\n')
#    for k, v in sorted_by_value.items():
#        output.write(str(k) + ','+ str(v[0]) + ',' + str(v[1]) + '\n')
#    output.close()

[('A', [2, 20]), ('C', [3, 20]), ('B', [1, 10]), ('D', [10, 10])]