# Function Application and Mapping Practical Solutions

Please note that there are many possible ways to complete the practical tasks that are not limited to the solutions provided by this document. The output of your code should however exactly match the following solutions.

---

1.  Start a new Jupyter Notebook

2.  Import the `pandas` Python package using the standard alias: `pd`, as well as `matplotlib.pyplot` as `plt`

In [2]:
import pandas as pd

3. Read the file `data/spending_ch6_practical.csv` located in the data folder into the `pandas` DataFrame `spending_df` with index column set to 'unique_id'

In [4]:
spending_df = pd.read_csv('data/spending_ch6_practical.csv', index_col='unique_id')

4. Filter out any specialties that have less than 200 records or for which the total number of beneficiaries is less than 15,000.

  * Furthermore, save your results as a sorted `DataFrame`. The sort order should be by specialty (Ascending), nb_beneficiaries (descending), spending (descending), respectively.

In [5]:
def filter_spending(x):
  return (x.shape[0] >= 200) and (x.nb_beneficiaries.sum() >= 15000)

spending_by_specialty = spending_df.groupby('specialty')
filtered_spending_df = spending_by_specialty.filter(filter_spending)
filtered_sorted_spending_df = (
    filtered_spending_df.sort_values(by=['specialty', 
                                         'nb_beneficiaries', 
                                         'spending'], 
                                     ascending=[True, False, False]))

  * How many unique specialties pass this filtering?

In [6]:
filtered_sorted_spending_df.specialty.unique().shape[0]

6

5. Let us walk through a *split apply combine* example step by step.
  * First *split* the `DataFrame` based on the entries in the `spending` column. Group the `spending_df` `DataFrame` by the entries in both the `specialty` and `medication` columns, save the resulting `GroupBy` object as `medication_spending`.

In [7]:
medication_spending = spending_df.groupby(['specialty','medication'])

  * Second *apply* and *combine* using the `GroupBy` `sum()` method to the `spending` column and save the resulting `Series` as `medication_spending_series`.

In [8]:
medication_spending_series = medication_spending['spending'].sum()

7. Group `medication_spending_df` on specialty and filter the specialties for which the sum of the top 2 medicines in terms of `spending` is less than 80\% of the total spending. For instance, if the sum of the `spending` for the highest 2 entries for `ADDICTION MEDICINE`  is  $817.88 + 82.62 =  900.5$ and the total `spending` is $920.06$, then we have that $900.5 / 920.06 > 0.8$, therefore, we should retain this specialty. However, the if sum of the top 2 medicines in `ALLERGY/IMMUNOLOGY` is $79261.85 + 34318.54 = 113580.39$, but the total sum is $189174.06$, we have that $113580.39 / 189174.06 < 0.8$, therefore, we should discard this specialty.

In [15]:
def filter_medication(x):
     return (x.nlargest(2).sum() / x.sum()) >= 0.8

medication_spending_by_specialty = medication_spending_series.groupby('specialty')
filtered_medication_spending = medication_spending_by_specialty.filter(filter_medication)

 * Print only the top two entries of each specialty in the resulting `Series`. 

In [19]:
filtered_medication_spending.groupby('specialty').nlargest(2)

specialty                                 specialty                                 medication                    
ADDICTION MEDICINE                        ADDICTION MEDICINE                        BUSPIRONE HCL                        817.88
                                                                                    LAMOTRIGINE                           82.62
CARDIAC ELECTROPHYSIOLOGY                 CARDIAC ELECTROPHYSIOLOGY                 RIVAROXABAN                       169196.74
                                                                                    DRONEDARONE HCL                    24922.61
CARDIAC SURGERY                           CARDIAC SURGERY                           INSULIN GLARGINE,HUM.REC.ANLOG     11990.01
                                                                                    HYDROCODONE/ACETAMINOPHEN            442.91
CERTIFIED NURSE MIDWIFE                   CERTIFIED NURSE MIDWIFE                   MIRABEGRON                       