# Vestwell Python Screener

If you have any questions or feel you are making assumptions, please record them in this notebook or in comments if you'd rather work in a `.py` file.  If you get stuck, try to explain in words how you would complete the exercise.

### Background

Vestwell provides a wide variety of investment choices to its users.  Participants in a retirement plan can choose between a pre-determined set of funds or they can choose their own custom set of funds from a list of choices.  Advisors can create their own models with a custom set of funds in which participants can choose to invest.  As a result, there are thousands of unique models on the Vestwell platform.  

One of Vestwell's partners has the same list of models in its database.  This partner will maintain an up-to-date list of funds for each model in their database.  For example, when a fund closes and is replaced by a new one, Vestwell's partner will update the model with the new fund in their database, but not in Vestwell's.  For this reason, Vestwell's database and our partner's database will get out of sync over time.  Unless, that is, you can build python script to reconcile the two databases.  We're rooting for you!

### The Data

Here's a high-level overview of the data.  We'll get into more details below as we dig in.

**Vestwell Data**

Each `program_id` has many `model_id`s.  Each `model_id` has many `symbol`s.

* model.csv:  Associations of programs to models.
* model_prop.csv:  Association of models to symbols.

**Partner Data**

Each `PLANID` has many `FUNDID`s.

* partner.csv:  Association of `PLANID` to `FUNDID` and `PLANINVCLOSEDATE`.  

**Some extra notes**

* The `FUNDID` in our partner's data is equivalent to `symbol` in Vestwell's data.  These are also referred to as "funds".
* The `PLANID` in our partner's data has information that is equivalent to the `program_id` in Vestwell's database (more details below in Step 2).
* The `PLANINVCLOSEDATE` in our partner's database is the date when a fund was closed.  If there isn't a date, then the fund has not been closed.
* Sometimes our partner has funds called either "Medicham" or "Electrike" which we ignore.

### Goal
The goal of this exercise is to compare Vestwell's data with our partner's data.  We want to figure out if Vestwell's model data is the same as our Partner's data.  We consider our partner's database the source of truth since their database will remain updated if there are any changes to funds.  Here's specifically what we are asking:

1.  Do the list of funds for each `model_id` in each `program_id` in Vestwell's database match the list of funds in our partner's `program_id`?  If there are any mismatches, what funds are missing from each database?  Remember, our partner doesn't use `model_id` so whichever funds they have for a particular `program_id` should match the list of funds that Vestwell has for any `model_id` that uses that `program_id`.

For example, if Vestwell's database has funds A, B and C for a `model_id` in a particular `program_id` and our partner's database has funds B, C, and D for the same `program_id` we would report that fund A is missing from our partner's database and that fund D is missing from our database for that `model_id` and `program_id`.

2.  Are there any funds in Vestwell's database that have closed?  If so, what are they for each `model_id` in each `program_id`?

For example, if our database has funds D, E, and F for a certain `model_id` in a `program_id` and our partner's database shows that fund D closed on 11/1/2019, then we would report that fund D has closed for that `model_id` and `program_id`.

Ideally, the output is in a form that can be passed to a Business Analyst to take action on.  For example, the output could look something like this:

| program_id | model_id | fund_missing_at_vw | fund_missing_at_partner | fund_closed |
|------------|--|---------------|--------------------|--------|
| 1          |1| None          | None               | None   |
| 1          |2| D, Z             | A                  | F   |
| 2          |2| None             | None               | F      |

### Some Hints
* We value correct output over efficient code.  
* Does your code execute fully without errors?
* What edge cases have you considered?  How could you handle them?
* Could another engineer read your code and easily understand what's going on and why you did things a certain way?
* Most analysts don't use Python or Jupyter Notebooks.  How could you give them the output of your code?

You should find the following mismatches for `program_id` 1 and `model_id` 268:
* Funds missing at Vestwell: 
    
        'Hitmontop',
        'Smoochum',
        'Gastly',
        'Teddiursa',
        'Meowth',
        'Sneasel',
        'Xatu',
        'Growlithe',
        'Torchic',
        'ManectricMega Manectric',
        'Smeargle',
        'Stantler',
        'Tyrogue'
    
    
* Funds missing at Partner:
    
        'Persian', 'Psyduck', 'Rattata', 'Dugtrio'
    
    
* Closed funds:

        None

## Table Schema


![Schema](data/Vestwell_challange.png)

# Step 0
Import any packages you'll need

In [124]:
import pandas as pd
import numpy as np
import math
import copy

In [50]:
pd.set_option('display.max_rows', 800)
pd.set_option('display.max_colwidth', None)

# Step 1
Import `partner.csv`, `model.csv`, and `model_prop.csv`.

In [2]:
model = pd.read_csv('data_V2/model.csv')
model.head()

Unnamed: 0,model_id,program_id
0,28,3
1,34,4
2,42,4
3,24,3
4,64,8


In [3]:
model.shape

(187, 2)

In [4]:
model_prop = pd.read_csv('data_V2/model_props.csv')
model_prop.head()

Unnamed: 0,model_props_id,model_id,symbol
0,541,80,Bulbasaur
1,542,80,Ivysaur
2,543,80,Venusaur
3,544,80,VenusaurMega Venusaur
4,545,80,Charmander


In [5]:
model_prop.shape

(729, 3)

In [6]:
partner = pd.read_csv('data_V2/partner.csv')
partner.head()

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID
0,VW0008000039,active,Medicham
1,VW0008000039,active,Arcanine
2,VW0008000039,11/01/2018,Clefairy
3,VW0008000039,active,Zubat
4,VW0008000039,active,Nidoking


In [7]:
partner.shape

(445, 3)

# Step 2 - working with the `partner.csv` data
Extract the `program_id` from the `PLANID` column in the `partner` dataframe.  The `program_id` is the first four characters in `PLANID` after "VW".  It's usually an integer.  If instead of digits, those characters are equal to "PALL" then the `program_id` = 1.  Drop any other rows remaining that do not have four digits in the first four characters after "VW" in the `PLANID` column.  For example, if a row in `PLANID` has `VWLASP000` then it should be dropped because it has `LASP` after `VW` instead of four digits.

In [9]:
#extracts plann number for partner.PLANID col, will return NA for any non 'PALL' of numeric value
def get_plan(plan_id):
    if plan_id[2:6]=='PALL':
        return 1
    #need to remove leading zeros 00
    #try except will only return the int on 2:6 if its a number
    try:   
        return int(plan_id[2:6])
    #otherwise return nan to be dropped
    except:
        return np.nan

In [10]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID
0,VW0008000039,active,Medicham
1,VW0008000039,active,Arcanine
2,VW0008000039,11/01/2018,Clefairy
3,VW0008000039,active,Zubat
4,VW0008000039,active,Nidoking
5,VW0008000039,active,Jigglypuff
6,VW0008000039,active,CharizardMega Charizard X
7,VW0008000039,active,Electrike
8,VW0008000039,active,Growlithe
9,VW0008000039,active,Fearow


In [11]:
partner['partner_program_id'] = 0

In [12]:
partner.partner_program_id = partner.PLANID.apply(lambda x: get_plan(x))

In [14]:
#need to drop the NAN columns
partner.dropna(inplace=True)

In [15]:
partner.partner_program_id.value_counts()

27    39
9     36
15    35
8     31
12    28
30    26
21    25
14    25
13    25
20    24
17    24
29    22
19    20
16    19
1     18
28    17
24    16
10    15
Name: partner_program_id, dtype: int64

In [16]:
partner.shape

(445, 4)

In [17]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
0,VW0008000039,active,Medicham,8
1,VW0008000039,active,Arcanine,8
2,VW0008000039,11/01/2018,Clefairy,8
3,VW0008000039,active,Zubat,8
4,VW0008000039,active,Nidoking,8
5,VW0008000039,active,Jigglypuff,8
6,VW0008000039,active,CharizardMega Charizard X,8
7,VW0008000039,active,Electrike,8
8,VW0008000039,active,Growlithe,8
9,VW0008000039,active,Fearow,8


# Step 3
Check if the funds match for each `program_id`.  In `partner.csv` the funds are in the `FUNDID` column and for `model_prop.csv` the funds are in the `symbol` column.  If there are any mismatches, return a list of which funds are missing from each database for each `model_id` in each `program_id`.

### Merge model and model_prop df on model_id column

In [20]:
vw_df = pd.merge(model, model_prop,  how='left', left_on=['model_id'], 
                     right_on = ['model_id'])

In [21]:
vw_df.model_id.value_counts().count()

187

#### Note no model_ids were dropped using left join

In [22]:
model.model_id.value_counts().count()

187

In [23]:
model_prop.model_id.value_counts().count()

187

#### Looking at values (delete later)

In [None]:
model[model.model_id==28]

In [None]:
model_prop[model_prop.model_id==28]

In [None]:
vw_df.head(10)

In [None]:
vw_df.shape

In [None]:
vw_df.isnull().sum()

### Now combine vw_df with partner

In [25]:
combined_df = pd.merge(partner, vw_df,  how='outer', left_on=['FUNDID', 'partner_program_id'], 
                     right_on = ['symbol','program_id'])

In [26]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
0,VW0008000039,active,Medicham,8.0,,,,
1,VW0008000039,active,Arcanine,8.0,65.0,8.0,311.0,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8.0,65.0,8.0,297.0,Clefairy
3,VW0008000039,active,Zubat,8.0,65.0,8.0,304.0,Zubat
4,VW0008000039,active,Nidoking,8.0,65.0,8.0,296.0,Nidoking
...,...,...,...,...,...,...,...,...
780,,,,,278.0,26.0,1962.0,Metapod
781,,,,,279.0,26.0,1963.0,Butterfree
782,,,,,280.0,26.0,1964.0,Weedle
783,,,,,281.0,26.0,1965.0,Kakuna


In [27]:
combined_df.shape

(785, 8)

In [28]:
combined_df.program_id.value_counts().sum()

729

In [29]:
combined_df.partner_program_id.value_counts().sum()

509

In [30]:
combined_df[(combined_df.partner_program_id==1)|(combined_df.program_id==1)]

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
482,VWPALL000076,active,Medicham,1.0,,,,
483,VWPALL000076,active,Hitmontop,1.0,269.0,1.0,1906.0,Hitmontop
484,VWPALL000076,active,Smoochum,1.0,269.0,1.0,1910.0,Smoochum
485,VWPALL000076,active,Diglett,1.0,268.0,1.0,1897.0,Diglett
486,VWPALL000076,active,Diglett,1.0,269.0,1.0,1907.0,Diglett
...,...,...,...,...,...,...,...,...
668,,,,,262.0,1.0,1862.0,Venonat
669,,,,,259.0,1.0,1838.0,Donphan
670,,,,,260.0,1.0,1846.0,Donphan
671,,,,,259.0,1.0,1839.0,Porygon2


### Drop Columns with "Medicham" or "Electrike"

In [31]:
dropindex = combined_df[combined_df['FUNDID'] == 'Medicham' ].index

In [32]:
#delete later
dropindex

Int64Index([0, 76, 91, 145, 223, 272, 305, 331, 347, 396, 443, 482], dtype='int64')

In [33]:
combined_df.drop(dropindex, inplace=True)

In [34]:
dropindex = combined_df[combined_df['FUNDID'] == 'Electrike' ].index

In [35]:
#delete later
dropindex

Int64Index([  7,  61,  88, 114, 131, 152, 185, 220, 229, 262, 295, 320, 342,
            372, 408, 430, 450, 500],
           dtype='int64')

In [36]:
combined_df.drop(dropindex, inplace=True)

In [37]:
combined_df.fillna('missing', inplace=True)

### Funds removed that are closed
ie MedichamMega Medicham & Manectric

In [39]:
partner[partner.partner_program_id==1]

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
427,VWPALL000076,active,Medicham,1
428,VWPALL000076,active,Hitmontop,1
429,VWPALL000076,active,Smoochum,1
430,VWPALL000076,active,Diglett,1
431,VWPALL000076,01/15/2019,MedichamMega Medicham,1
432,VWPALL000076,active,Gastly,1
433,VWPALL000076,active,Teddiursa,1
434,VWPALL000076,active,Meowth,1
435,VWPALL000076,active,Sneasel,1
436,VWPALL000076,active,Electrike,1


In [40]:
vw_df[vw_df.model_id==268]

Unnamed: 0,model_id,program_id,model_props_id,symbol
40,268,1,1899,Persian
41,268,1,1897,Diglett
42,268,1,1900,Psyduck
43,268,1,1896,Rattata
44,268,1,1898,Dugtrio


In [41]:
combined_df[combined_df.model_id==268]

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
485,VWPALL000076,active,Diglett,1,268,1,1897,Diglett
562,missing,missing,missing,missing,268,1,1899,Persian
569,missing,missing,missing,missing,268,1,1900,Psyduck
578,missing,missing,missing,missing,268,1,1896,Rattata
587,missing,missing,missing,missing,268,1,1898,Dugtrio


In [42]:
combined_df[combined_df.FUNDID=='MedichamMega Medicham']

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
424,VW0029000243,01/15/2019,MedichamMega Medicham,29,missing,missing,missing,missing
495,VWPALL000076,01/15/2019,MedichamMega Medicham,1,missing,missing,missing,missing


In [43]:
combined_df[combined_df.FUNDID=='ManectricMega Manectric']

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
505,VWPALL000076,active,ManectricMega Manectric,1,missing,missing,missing,missing


### Group and combine

In [44]:
df1 =combined_df.groupby(['partner_program_id','program_id','model_id'])['FUNDID'].apply(','.join).reset_index()

In [45]:
df2 = combined_df.groupby(['partner_program_id','program_id','model_id'])['symbol'].apply(','.join).reset_index()

In [46]:
df1

Unnamed: 0,partner_program_id,program_id,model_id,FUNDID
0,1,1,260,"Diglett,Torchic"
1,1,1,261,Diglett
2,1,1,262,Diglett
3,1,1,263,Diglett
4,1,1,264,Diglett
...,...,...,...,...
212,missing,29,352,"missing,missing,missing,missing,missing,missing"
213,missing,29,353,"missing,missing,missing,missing,missing,missing"
214,missing,29,354,"missing,missing,missing,missing,missing,missing"
215,missing,29,355,"missing,missing,missing,missing,missing,missin..."


In [47]:
df2

Unnamed: 0,partner_program_id,program_id,model_id,symbol
0,1,1,260,"Diglett,Torchic"
1,1,1,261,Diglett
2,1,1,262,Diglett
3,1,1,263,Diglett
4,1,1,264,Diglett
...,...,...,...,...
212,missing,29,352,"Rattata,Dugtrio,Psyduck,Persian,Paras,Phanpy"
213,missing,29,353,"Rattata,Dugtrio,Psyduck,Persian,Paras,Phanpy"
214,missing,29,354,"Rattata,Psyduck,Paras,Venonat,Phanpy,Donphan"
215,missing,29,355,"Rattata,Dugtrio,Psyduck,Paras,Venonat,Phanpy,P..."


In [48]:
c_df = pd.merge(df1, df2,  how='outer',on=['partner_program_id','program_id','model_id'])

In [51]:
c_df

Unnamed: 0,partner_program_id,program_id,model_id,FUNDID,symbol
0,1,1,260,"Diglett,Torchic","Diglett,Torchic"
1,1,1,261,Diglett,Diglett
2,1,1,262,Diglett,Diglett
3,1,1,263,Diglett,Diglett
4,1,1,264,Diglett,Diglett
5,1,1,265,Diglett,Diglett
6,1,1,266,Diglett,Diglett
7,1,1,267,Diglett,Diglett
8,1,1,268,Diglett,Diglett
9,1,1,269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue","Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


### Make Output_df

#### Create frame for baseline for output_df

In [52]:
#will create output df with every unique program_id and model_id combination
output_df_1 = c_df.groupby(['program_id','model_id']).size().reset_index()

In [55]:
output_df_1

Unnamed: 0,program_id,model_id,0
0,1,259,1
1,1,260,2
2,1,261,2
3,1,262,2
4,1,263,2
5,1,264,2
6,1,265,2
7,1,266,2
8,1,267,2
9,1,268,2


In [56]:
output_df_1.drop(index=output_df_1[output_df_1.program_id=='missing'].index,inplace=True)

#### Assign initial funds_missing_at_partner column (part 1)

In [57]:
idx = c_df.loc[c_df.partner_program_id =='missing'].index
output_df_2 = c_df.loc[idx, ['program_id', 'model_id', 'symbol']]
output_df_2.rename(columns={"symbol": "funds_missing_at_partner"}, inplace=True)

In [58]:
output_df_2

Unnamed: 0,program_id,model_id,funds_missing_at_partner
146,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2"
147,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan"
148,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2"
149,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat"
150,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
151,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
152,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
153,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras"
154,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras"
155,1,268,"Persian,Psyduck,Rattata,Dugtrio"


#### Join values to baseline output_df

In [59]:
#left join keeps all the unique program_id & model_id combinations from output_df_1
output_df = pd.merge(output_df_1, output_df_2, how='left', left_on=['program_id','model_id'], 
                     right_on = ['program_id','model_id'])
output_df.drop(columns=[0], inplace=True)

In [60]:
output_df

Unnamed: 0,program_id,model_id,funds_missing_at_partner
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio"


#### Assign funds_missing_at_partner column (part 2) 
- edge case where partner_program_id db is completely missing a vw program_id

In [61]:
#gets list of program_id completely missing in partner.partner_program_id db
missing_program_id_partner = []
for i in c_df.program_id.unique():
    if i not in c_df.partner_program_id.unique():
        missing_program_id_partner.append(i)

In [65]:
combined_df[combined_df.partner_program_id==3]

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol


In [62]:
missing_program_id_partner

[3.0, 4.0, 11.0, 22.0, 26.0]

In [66]:
for miss_id in missing_program_id_partner:
    funds = []
    for item in output_df[output_df.program_id == miss_id].funds_missing_at_partner.values:
        try:
            funds.extend(item.split(','))
        except:
            funds.extend(item)
    #assign set of funds to all program_ids at current missing_id
    idx = output_df[output_df.program_id == miss_id].funds_missing_at_partner.index
    for index in idx:
        output_df.at[idx, 'funds_missing_at_partner'] = set(funds)

In [67]:
output_df[output_df.program_id == 3]#.funds_missing_at_partner

Unnamed: 0,program_id,model_id,funds_missing_at_partner
11,3,17,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
12,3,18,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
13,3,19,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
14,3,20,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
15,3,21,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
16,3,22,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
17,3,23,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
18,3,24,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"
19,3,28,"{Weedle, Pidgeotto, PidgeotMega Pidgeot, Butterfree, BeedrillMega Beedrill, Pidgeot, Kakuna, Beedrill, Metapod, Pidgey, Caterpie}"


#### Create and assign funds_missing_at_vw column (part 1)

##### Initial assignment
get each (partner_program_id & FUNDID) where (program_id!='missing')

In [68]:
missing_vw = c_df.loc[c_df.loc[c_df.program_id =='missing'].index, 
                      ['partner_program_id', 'FUNDID']].set_index('partner_program_id')

In [69]:
#function gets a program_id and  if missing_vw.at[partner_program_id, 'FUNDID'] exists
# it assigns the missing FUNDIDS the that program_id
def pop_missing_at_vw_part_one(partner_program_id):
    #if this program_id has missing FUNDS
    try:
        return missing_vw.at[partner_program_id, 'FUNDID']
    #else keep it empty for now
    except:
        return np.nan

In [70]:
missing_vw

Unnamed: 0_level_0,FUNDID
partner_program_id,Unnamed: 1_level_1
1.0,"MedichamMega Medicham,Manectric,ManectricMega Manectric"
8.0,Carvanha
9.0,Sharpedo
15.0,"Charmander,SharpedoMega Sharpedo,CharizardMega Charizard Y,BlastoiseMega Blastoise,CharizardMega Charizard X,Wailmer"
24.0,Wailord
27.0,"Plusle,Sharpedo"
28.0,"Bulbasaur,Plusle"
29.0,"MedichamMega Medicham,Minun,Volbeat,Illumise,Venomoth,Roselia,Parasect,Golduck,Gulpin"
30.0,Numel


In [71]:
#assign the missing FUNDIDS to all program_ids
output_df['funds_missing_at_vw'] = output_df.program_id.apply(pop_missing_at_vw_part_one)

In [72]:
output_df

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric"


In [None]:
# c_df.loc[c_df.program_id =='missing'].index

In [None]:
# c_df.loc[c_df.program_id =='missing']

##### Complete Population across all (part 2)

In [73]:
c_df

Unnamed: 0,partner_program_id,program_id,model_id,FUNDID,symbol
0,1,1,260,"Diglett,Torchic","Diglett,Torchic"
1,1,1,261,Diglett,Diglett
2,1,1,262,Diglett,Diglett
3,1,1,263,Diglett,Diglett
4,1,1,264,Diglett,Diglett
5,1,1,265,Diglett,Diglett
6,1,1,266,Diglett,Diglett
7,1,1,267,Diglett,Diglett
8,1,1,268,Diglett,Diglett
9,1,1,269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue","Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


##### Create agg_fund lists for each unique program_id & partner_program_id pairs
we ignore columns with 'missing' valuesby replacing with nan

In [74]:
remainder_groupby = c_df.replace({'missing': np.nan}).groupby(['partner_program_id',
                                                               'program_id'])['FUNDID'].apply(','.join).reset_index()


In [75]:
remainder_groupby.rename(columns={"FUNDID": "fund_agg"}, inplace=True)
remainder_groupby.fund_agg = remainder_groupby.fund_agg.apply(lambda x: set(x.split(',')))
remainder_groupby

Unnamed: 0,partner_program_id,program_id,fund_agg
0,1.0,1.0,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}"
1,8.0,8.0,"{Arbok, Nidorina, Spearow, Primeape, Golbat, Nidoqueen, Ninetales, Sandslash, Nidoking, Zubat, Raticate, CharizardMega Charizard X, Nidorino, Vulpix, Clefairy, Arcanine, Vileplume, Ekans, Fearow, Wigglytuff, Gloom, Clefable, Sandshrew, Growlithe, Pikachu, Jigglypuff, Raichu, Oddish}"
2,9.0,9.0,"{Rapidash, Weepinbell, BlastoiseMega Blastoise, Charmander, Magneton, Machoke, Poliwhirl, SlowbroMega Slowbro, Geodude, Poliwrath, Poliwag, Magnemite, Zubat, Slowbro, Ponyta, Doduo, Tentacool, Venusaur, Tentacruel, Golem, Graveler, Blastoise, VenusaurMega Venusaur, Squirtle, Slowpoke, Machop, Alakazam, Machamp, Farfetch'd, Abra, Bellsprout, Victreebel, Kadabra, AlakazamMega Alakazam}"
3,10.0,10.0,"{Venusaur, Bulbasaur, BlastoiseMega Blastoise, Charmander, Blastoise, VenusaurMega Venusaur, Ivysaur, Squirtle, CharizardMega Charizard Y, Charizard, Wartortle, CharizardMega Charizard X, Charmeleon}"
4,12.0,12.0,"{Rapidash, BlastoiseMega Blastoise, Dodrio, Charmander, CharizardMega Charizard Y, Magneton, Seel, SlowbroMega Slowbro, Magnemite, Slowbro, Grimer, Ponyta, CharizardMega Charizard X, Doduo, Shellder, Dewgong, Golem, Blastoise, VenusaurMega Venusaur, Squirtle, Slowpoke, Charmeleon, Cloyster, Muk, Farfetch'd, Charizard}"
5,13.0,13.0,"{Rapidash, Rhyhorn, Haunter, Koffing, Magneton, SlowbroMega Slowbro, Geodude, Magnemite, Slowbro, Ponyta, CharizardMega Charizard X, Doduo, Hitmonlee, Marowak, Hitmonchan, Voltorb, Slowpoke, Exeggcute, Farfetch'd, Weezing, Lickitung, Electrode, Cubone, Exeggutor}"
6,14.0,14.0,"{Bulbasaur, Quilava, Charmander, Mew, Meganium, Dragonite, Aerodactyl, Articuno, Chikorita, Zapdos, Cyndaquil, Vulpix, Mewtwo, Snorlax, AerodactylMega Aerodactyl, Magikarp, Bayleef, Dragonair, MewtwoMega Mewtwo Y, MewtwoMega Mewtwo X, Gastly, Moltres, Dratini}"
7,15.0,15.0,"{Quagsire, Arbok, Nidorina, Sunflora, Misdreavus, Sunkern, Spearow, Slowking, Murkrow, Nidoqueen, Yanma, Sandslash, Politoed, Jumpluff, Raticate, Nidorino, Hoppip, Ekans, Fearow, Umbreon, Magikarp, Aipom, Skiploom, Sandshrew, Pikachu, Raichu, Wooper, Espeon}"
8,16.0,16.0,"{Forretress, Gligar, CharizardMega Charizard Y, Snorlax, Goldeen, BlastoiseMega Blastoise, Pineco, Wobbuffet, Gastly, Dunsparce, Snubbull, Unown, Girafarig, SteelixMega Steelix, VenusaurMega Venusaur, CharizardMega Charizard X, Totodile, Charmeleon}"
9,17.0,17.0,"{Flaaffy, Sudowoodo, Spinarak, Ampharos, Cleffa, Natu, Mareep, Chinchou, Pichu, Chikorita, Azumarill, Lanturn, Ariados, Igglybuff, Crobat, Marill, AmpharosMega Ampharos, Xatu, Togetic, Ledian, Togepi, Bellossom}"


In [80]:
# cast to allow merge with c_df
remainder_groupby=remainder_groupby.astype({'partner_program_id': 'object','program_id': 'object'})

In [81]:
# c_df.astype({'partner_program_id': 'float','program_id': 'float'})

In [82]:
remainder_groupby.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   partner_program_id  18 non-null     object
 1   program_id          18 non-null     object
 2   fund_agg            18 non-null     object
dtypes: object(3)
memory usage: 560.0+ bytes


In [83]:
c_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 217 entries, 0 to 216
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   partner_program_id  217 non-null    object
 1   program_id          217 non-null    object
 2   model_id            217 non-null    object
 3   FUNDID              217 non-null    object
 4   symbol              217 non-null    object
dtypes: object(5)
memory usage: 20.2+ KB


In [84]:
missing_df = pd.merge(remainder_groupby, c_df.replace({'missing': np.nan}),how='inner',
                      left_on=['program_id', 'partner_program_id'], right_on = ['program_id', 'partner_program_id'])
missing_df.drop(columns='FUNDID',inplace=True)
# missing_df.dropna(inplace=True)

In [85]:
missing_df

Unnamed: 0,partner_program_id,program_id,fund_agg,model_id,symbol
0,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",260.0,"Diglett,Torchic"
1,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",261.0,Diglett
2,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",262.0,Diglett
3,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",263.0,Diglett
4,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",264.0,Diglett
5,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",265.0,Diglett
6,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",266.0,Diglett
7,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",267.0,Diglett
8,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",268.0,Diglett
9,1,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}",269.0,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


In [86]:
missing_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 137 entries, 0 to 136
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   partner_program_id  137 non-null    object 
 1   program_id          137 non-null    object 
 2   fund_agg            137 non-null    object 
 3   model_id            137 non-null    float64
 4   symbol              137 non-null    object 
dtypes: float64(1), object(4)
memory usage: 6.4+ KB


In [87]:
missing_df.fund_agg = missing_df.fund_agg.apply(lambda x: list(x))

In [88]:
missing_df

Unnamed: 0,partner_program_id,program_id,fund_agg,model_id,symbol
0,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",260.0,"Diglett,Torchic"
1,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",261.0,Diglett
2,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",262.0,Diglett
3,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",263.0,Diglett
4,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",264.0,Diglett
5,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",265.0,Diglett
6,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",266.0,Diglett
7,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",267.0,Diglett
8,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",268.0,Diglett
9,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",269.0,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


In [91]:
c_df

Unnamed: 0,partner_program_id,program_id,model_id,FUNDID,symbol
0,1,1,260,"Diglett,Torchic","Diglett,Torchic"
1,1,1,261,Diglett,Diglett
2,1,1,262,Diglett,Diglett
3,1,1,263,Diglett,Diglett
4,1,1,264,Diglett,Diglett
5,1,1,265,Diglett,Diglett
6,1,1,266,Diglett,Diglett
7,1,1,267,Diglett,Diglett
8,1,1,268,Diglett,Diglett
9,1,1,269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue","Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


In [89]:
def what_is_missing(fund_agg, symbol):
    local_fund_agg = copy.deepcopy(fund_agg)
    try:
        my_list = symbol.split(',')
    except:
        my_list = [symbol]
    for fund in my_list:
        if fund in local_fund_agg:
            local_fund_agg.remove(fund)
    return local_fund_agg

In [90]:
missing_df['missing']= missing_df.apply(lambda x: what_is_missing(x.fund_agg, x.symbol), axis=1)

# df['col_3'] = df.apply(lambda x: function(x.col_1, x.col_2), axis=1)

In [92]:
missing_df

Unnamed: 0,partner_program_id,program_id,fund_agg,model_id,symbol,missing
0,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",260.0,"Diglett,Torchic","[Growlithe, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
1,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",261.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
2,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",262.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
3,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",263.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
4,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",264.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
5,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",265.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
6,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",266.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
7,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",267.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
8,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",268.0,Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
9,1,1,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",269.0,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue",[Torchic]


#### Now combine with output df

In [94]:
missing_df = missing_df.astype({'model_id': 'object', 'program_id': 'object'})

In [95]:
missing_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 137 entries, 0 to 136
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   partner_program_id  137 non-null    object
 1   program_id          137 non-null    object
 2   fund_agg            137 non-null    object
 3   model_id            137 non-null    object
 4   symbol              137 non-null    object
 5   missing             137 non-null    object
dtypes: object(6)
memory usage: 7.5+ KB


In [96]:
output_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 187 entries, 0 to 186
Data columns (total 4 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   program_id                187 non-null    object
 1   model_id                  187 non-null    object
 2   funds_missing_at_partner  71 non-null     object
 3   funds_missing_at_vw       85 non-null     object
dtypes: object(4)
memory usage: 12.3+ KB


In [97]:
output_final = pd.merge(output_df, missing_df,  how='outer',left_on=['program_id', 'model_id'], 
                     right_on = ['program_id', 'model_id'])

In [98]:
output_final 

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw,partner_program_id,fund_agg,symbol,missing
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",,,,
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]","Diglett,Torchic","[Growlithe, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"


In [99]:
#use remainder_groupby to populate missing fund_agg cells in output final
remainder_groupby.set_index('partner_program_id', inplace=True)
remainder_groupby

Unnamed: 0_level_0,program_id,fund_agg
partner_program_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,1,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}"
8.0,8,"{Arbok, Nidorina, Spearow, Primeape, Golbat, Nidoqueen, Ninetales, Sandslash, Nidoking, Zubat, Raticate, CharizardMega Charizard X, Nidorino, Vulpix, Clefairy, Arcanine, Vileplume, Ekans, Fearow, Wigglytuff, Gloom, Clefable, Sandshrew, Growlithe, Pikachu, Jigglypuff, Raichu, Oddish}"
9.0,9,"{Rapidash, Weepinbell, BlastoiseMega Blastoise, Charmander, Magneton, Machoke, Poliwhirl, SlowbroMega Slowbro, Geodude, Poliwrath, Poliwag, Magnemite, Zubat, Slowbro, Ponyta, Doduo, Tentacool, Venusaur, Tentacruel, Golem, Graveler, Blastoise, VenusaurMega Venusaur, Squirtle, Slowpoke, Machop, Alakazam, Machamp, Farfetch'd, Abra, Bellsprout, Victreebel, Kadabra, AlakazamMega Alakazam}"
10.0,10,"{Venusaur, Bulbasaur, BlastoiseMega Blastoise, Charmander, Blastoise, VenusaurMega Venusaur, Ivysaur, Squirtle, CharizardMega Charizard Y, Charizard, Wartortle, CharizardMega Charizard X, Charmeleon}"
12.0,12,"{Rapidash, BlastoiseMega Blastoise, Dodrio, Charmander, CharizardMega Charizard Y, Magneton, Seel, SlowbroMega Slowbro, Magnemite, Slowbro, Grimer, Ponyta, CharizardMega Charizard X, Doduo, Shellder, Dewgong, Golem, Blastoise, VenusaurMega Venusaur, Squirtle, Slowpoke, Charmeleon, Cloyster, Muk, Farfetch'd, Charizard}"
13.0,13,"{Rapidash, Rhyhorn, Haunter, Koffing, Magneton, SlowbroMega Slowbro, Geodude, Magnemite, Slowbro, Ponyta, CharizardMega Charizard X, Doduo, Hitmonlee, Marowak, Hitmonchan, Voltorb, Slowpoke, Exeggcute, Farfetch'd, Weezing, Lickitung, Electrode, Cubone, Exeggutor}"
14.0,14,"{Bulbasaur, Quilava, Charmander, Mew, Meganium, Dragonite, Aerodactyl, Articuno, Chikorita, Zapdos, Cyndaquil, Vulpix, Mewtwo, Snorlax, AerodactylMega Aerodactyl, Magikarp, Bayleef, Dragonair, MewtwoMega Mewtwo Y, MewtwoMega Mewtwo X, Gastly, Moltres, Dratini}"
15.0,15,"{Quagsire, Arbok, Nidorina, Sunflora, Misdreavus, Sunkern, Spearow, Slowking, Murkrow, Nidoqueen, Yanma, Sandslash, Politoed, Jumpluff, Raticate, Nidorino, Hoppip, Ekans, Fearow, Umbreon, Magikarp, Aipom, Skiploom, Sandshrew, Pikachu, Raichu, Wooper, Espeon}"
16.0,16,"{Forretress, Gligar, CharizardMega Charizard Y, Snorlax, Goldeen, BlastoiseMega Blastoise, Pineco, Wobbuffet, Gastly, Dunsparce, Snubbull, Unown, Girafarig, SteelixMega Steelix, VenusaurMega Venusaur, CharizardMega Charizard X, Totodile, Charmeleon}"
17.0,17,"{Flaaffy, Sudowoodo, Spinarak, Ampharos, Cleffa, Natu, Mareep, Chinchou, Pichu, Chikorita, Azumarill, Lanturn, Ariados, Igglybuff, Crobat, Marill, AmpharosMega Ampharos, Xatu, Togetic, Ledian, Togepi, Bellossom}"


In [100]:
#populate NaN in the missing col
def get_missing_missing(partner_program_id, missing):
    try:
        if math.isnan(missing):
            return remainder_groupby.at[partner_program_id, 'fund_agg']
    except:
        return missing

In [101]:
output_final.missing = output_final.apply(lambda x: get_missing_missing(x.program_id, x.missing), axis=1)

In [102]:
output_final.isna().sum()

program_id                    0
model_id                      0
funds_missing_at_partner    116
funds_missing_at_vw         102
partner_program_id           50
fund_agg                     50
symbol                       50
missing                      48
dtype: int64

In [103]:
output_final

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw,partner_program_id,fund_agg,symbol,missing
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",,,,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]","Diglett,Torchic","[Growlithe, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"


#### Add missing column to funds_missing_at_vw and delete other columns


In [104]:
def combine_vw_missing_and_missing_col(funds_missing_at_vw, missing):
    try:
        #does 'nothing where program ids only exist at vw'
        if math.isnan(missing):
            return funds_missing_at_vw
    except:
        try:
            if math.isnan(funds_missing_at_vw):
                return ','.join(missing)
        except:
                return funds_missing_at_vw + ',' + ','.join(missing)

In [105]:
output_final.funds_missing_at_vw = output_final.apply(lambda x: combine_vw_missing_and_missing_col(x.funds_missing_at_vw, x.missing), axis=1)


In [106]:
output_final

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw,partner_program_id,fund_agg,symbol,missing
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Diglett,Smoochum,Smeargle",,,,"{Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle}"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]","Diglett,Torchic","[Growlithe, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",1.0,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Diglett, Smoochum, Smeargle]",Diglett,"[Growlithe, Torchic, Teddiursa, Tyrogue, Stantler, Hitmontop, Xatu, Gastly, Sneasel, Meowth, Smoochum, Smeargle]"


In [107]:
output_final.columns

Index(['program_id', 'model_id', 'funds_missing_at_partner',
       'funds_missing_at_vw', 'partner_program_id', 'fund_agg', 'symbol',
       'missing'],
      dtype='object')

In [108]:
output_final = output_final.loc[:,['program_id', 'model_id', 'funds_missing_at_partner',
       'funds_missing_at_vw']]

# Step 4 - Check for any closed funds
Check each `model_id` in each `program_id` in Vestwell's data to see if our partner has indicated a fund has closed.  We don't care about funds that have closed that aren't in Vestwell's data.  Add this information to your output from step 3.

### Add fund_missing col and populate

In [109]:
output_final

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Diglett,Smoochum,Smeargle"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle"


In [110]:
# set output_df index to program_id
output_final.set_index(['program_id', 'model_id'], inplace=True)
output_final['fund_closed'] = np.empty((len(output_final), 0)).tolist()

In [111]:
output_final

Unnamed: 0_level_0,Unnamed: 1_level_0,funds_missing_at_partner,funds_missing_at_vw,fund_closed
program_id,model_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,259.0,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Diglett,Smoochum,Smeargle",[]
1.0,260.0,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,261.0,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,262.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,263.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,264.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,265.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,266.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,267.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]
1.0,268.0,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle",[]


In [112]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
1,VW0008000039,active,Arcanine,8,65,8,311,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8,65,8,297,Clefairy
3,VW0008000039,active,Zubat,8,65,8,304,Zubat
4,VW0008000039,active,Nidoking,8,65,8,296,Nidoking
5,VW0008000039,active,Jigglypuff,8,65,8,302,Jigglypuff
6,VW0008000039,active,CharizardMega Charizard X,8,65,8,299,CharizardMega Charizard X
8,VW0008000039,active,Growlithe,8,65,8,310,Growlithe
9,VW0008000039,active,Fearow,8,55,8,286,Fearow
10,VW0008000039,active,Ekans,8,56,8,287,Ekans
11,VW0008000039,active,Arbok,8,57,8,288,Arbok


In [113]:
combined_df.fillna('missing', inplace=True)

In [114]:
for index, row in combined_df.iterrows():
    if (row['PLANINVCLOSEDATE'] != 'active') &(row['PLANINVCLOSEDATE']!='missing'):
        #if we have a model_id
        if (row['model_id']!='missing'):
            output_final.at[(row['program_id'],row['model_id']), 'fund_closed'].append(row['FUNDID'])
        #else broadcast across all program_ids
        else:
            idx_2 = output_final.loc[row['partner_program_id']].index
            for i in idx_2:
                output_final.at[(row['partner_program_id'], i), 'fund_closed'].append(row['FUNDID'])


        

In [115]:
output_final

Unnamed: 0_level_0,Unnamed: 1_level_0,funds_missing_at_partner,funds_missing_at_vw,fund_closed
program_id,model_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,259.0,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Diglett,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,260.0,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,261.0,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,262.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,263.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,264.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,265.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,266.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,267.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,268.0,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"


### Remove fund_closed from funds_missing_at_vw

In [116]:
def fund_closed(funds_closed, funds_missing):
    #cast funds_missing into list format
    try:
        funds_missing = funds_missing.split(',')
    except:
        funds_missing = [funds_missing]
        
    for fund in funds_closed:
        if fund in funds_missing:
            funds_missing.remove(fund)
    try:
        return ','.join(funds_missing)
    except:
        return str(funds_missing)


In [117]:
output_final.funds_missing_at_vw = output_final.apply(lambda x: fund_closed(x.fund_closed,
                                                                            x.funds_missing_at_vw),axis=1)


In [118]:
output_final

Unnamed: 0_level_0,Unnamed: 1_level_0,funds_missing_at_partner,funds_missing_at_vw,fund_closed
program_id,model_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,259.0,"Paras,Phanpy,Venonat,Donphan,Porygon2","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Diglett,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,260.0,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","ManectricMega Manectric,Growlithe,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,261.0,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,262.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,263.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,264.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,265.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,266.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,267.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"
1.0,268.0,"Persian,Psyduck,Rattata,Dugtrio","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","[MedichamMega Medicham, Manectric]"


### Clean output_df to look like example

In [119]:
output_final.fillna('missing', inplace=True)

In [120]:
def cleaner(x):
    if (x=='missing')| (x=='nan')|(x=='[nan]')|(x==''):
        return 'None'
    elif (type(x)==set)|(type(x)==list):
        return ','.join(x)
    else:
        return x

In [121]:
output_final.funds_missing_at_vw = output_final.funds_missing_at_vw.apply(cleaner)
output_final.funds_missing_at_partner = output_final.funds_missing_at_partner.apply(cleaner)
output_final.fund_closed = output_final.fund_closed.apply(cleaner)

In [122]:
output_final

Unnamed: 0_level_0,Unnamed: 1_level_0,funds_missing_at_partner,funds_missing_at_vw,fund_closed
program_id,model_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,259.0,"Paras,Phanpy,Venonat,Donphan,Porygon2","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Diglett,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,260.0,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","ManectricMega Manectric,Growlithe,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,261.0,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,262.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,263.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,264.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,265.0,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,266.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,267.0,"Persian,Psyduck,Rattata,Dugtrio,Paras","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"
1.0,268.0,"Persian,Psyduck,Rattata,Dugtrio","ManectricMega Manectric,Growlithe,Torchic,Teddiursa,Tyrogue,Stantler,Hitmontop,Xatu,Gastly,Sneasel,Meowth,Smoochum,Smeargle","MedichamMega Medicham,Manectric"


In [123]:
output_final.reset_index().to_csv('output_df.csv', index=False)