# Vestwell Python Screener

If you have any questions or feel you are making assumptions, please record them in this notebook or in comments if you'd rather work in a `.py` file.  If you get stuck, try to explain in words how you would complete the exercise.

### Background

Vestwell provides a wide variety of investment choices to its users.  Participants in a retirement plan can choose between a pre-determined set of funds or they can choose their own custom set of funds from a list of choices.  Advisors can create their own models with a custom set of funds in which participants can choose to invest.  As a result, there are thousands of unique models on the Vestwell platform.  

One of Vestwell's partners has the same list of models in its database.  This partner will maintain an up-to-date list of funds for each model in their database.  For example, when a fund closes and is replaced by a new one, Vestwell's partner will update the model with the new fund in their database, but not in Vestwell's.  For this reason, Vestwell's database and our partner's database will get out of sync over time.  Unless, that is, you can build python script to reconcile the two databases.  We're rooting for you!

### The Data

Here's a high-level overview of the data.  We'll get into more details below as we dig in.

**Vestwell Data**

Each `program_id` has many `model_id`s.  Each `model_id` has many `symbol`s.

* model.csv:  Associations of programs to models.
* model_prop.csv:  Association of models to symbols.

**Partner Data**

Each `PLANID` has many `FUNDID`s.

* partner.csv:  Association of `PLANID` to `FUNDID` and `PLANINVCLOSEDATE`.  

**Some extra notes**

* The `FUNDID` in our partner's data is equivalent to `symbol` in Vestwell's data.  These are also referred to as "funds".
* The `PLANID` in our partner's data has information that is equivalent to the `program_id` in Vestwell's database (more details below in Step 2).
* The `PLANINVCLOSEDATE` in our partner's database is the date when a fund was closed.  If there isn't a date, then the fund has not been closed.
* Sometimes our partner has funds called either "Medicham" or "Electrike" which we ignore.

### Goal
The goal of this exercise is to compare Vestwell's data with our partner's data.  We want to figure out if Vestwell's model data is the same as our Partner's data.  We consider our partner's database the source of truth since their database will remain updated if there are any changes to funds.  Here's specifically what we are asking:

1.  Do the list of funds for each `model_id` in each `program_id` in Vestwell's database match the list of funds in our partner's `program_id`?  If there are any mismatches, what funds are missing from each database?  Remember, our partner doesn't use `model_id` so whichever funds they have for a particular `program_id` should match the list of funds that Vestwell has for any `model_id` that uses that `program_id`.

For example, if Vestwell's database has funds A, B and C for a `model_id` in a particular `program_id` and our partner's database has funds B, C, and D for the same `program_id` we would report that fund A is missing from our partner's database and that fund D is missing from our database for that `model_id` and `program_id`.

2.  Are there any funds in Vestwell's database that have closed?  If so, what are they for each `model_id` in each `program_id`?

For example, if our database has funds D, E, and F for a certain `model_id` in a `program_id` and our partner's database shows that fund D closed on 11/1/2019, then we would report that fund D has closed for that `model_id` and `program_id`.

Ideally, the output is in a form that can be passed to a Business Analyst to take action on.  For example, the output could look something like this:

| program_id | model_id | fund_missing_at_vw | fund_missing_at_partner | fund_closed |
|------------|--|---------------|--------------------|--------|
| 1          |1| None          | None               | None   |
| 1          |2| D, Z             | A                  | F   |
| 2          |2| None             | None               | F      |

### Some Hints
* We value correct output over efficient code.  
* Does your code execute fully without errors?
* What edge cases have you considered?  How could you handle them?
* Could another engineer read your code and easily understand what's going on and why you did things a certain way?
* Most analysts don't use Python or Jupyter Notebooks.  How could you give them the output of your code?

You should find the following mismatches for `program_id` 1 and `model_id` 268:
* Funds missing at Vestwell: 
    
        'Hitmontop',
        'Smoochum',
        'Gastly',
        'Teddiursa',
        'Meowth',
        'Sneasel',
        'Xatu',
        'Growlithe',
        'Torchic',
        'ManectricMega Manectric',
        'Smeargle',
        'Stantler',
        'Tyrogue'
    
    
* Funds missing at Partner:
    
        'Persian', 'Psyduck', 'Rattata', 'Dugtrio'
    
    
* Closed funds:

        None

## Step 0
Import any packages you'll need

In [1]:
import pandas as pd
import numpy as np
import math

## Step 1
Import `partner.csv`, `model.csv`, and `model_prop.csv`.

In [2]:
model = pd.read_csv('data_V2/model.csv')
model.head()

Unnamed: 0,model_id,program_id
0,28,3
1,34,4
2,42,4
3,24,3
4,64,8


In [3]:
model.shape

(187, 2)

In [4]:
model_prop = pd.read_csv('data_V2/model_props.csv')
model_prop.head()

Unnamed: 0,model_props_id,model_id,symbol
0,541,80,Bulbasaur
1,542,80,Ivysaur
2,543,80,Venusaur
3,544,80,VenusaurMega Venusaur
4,545,80,Charmander


In [5]:
model_prop.shape

(729, 3)

In [6]:
partner = pd.read_csv('data_V2/partner.csv')
partner.head()

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID
0,VW0008000039,active,Medicham
1,VW0008000039,active,Arcanine
2,VW0008000039,11/01/2018,Clefairy
3,VW0008000039,active,Zubat
4,VW0008000039,active,Nidoking


In [7]:
partner.shape

(445, 3)

## Step 2 - working with the `partner.csv` data
Extract the `program_id` from the `PLANID` column in the `partner` dataframe.  The `program_id` is the first four characters in `PLANID` after "VW".  It's usually an integer.  If instead of digits, those characters are equal to "PALL" then the `program_id` = 1.  Drop any other rows remaining that do not have four digits in the first four characters after "VW" in the `PLANID` column.  For example, if a row in `PLANID` has `VWLASP000` then it should be dropped because it has `LASP` after `VW` instead of four digits.

In [8]:
#extracts plann number for partner.PLANID col, will return NA for any non 'PALL' of numeric value
def get_plan(plan_id):
    if plan_id[2:6]=='PALL':
        return 1
    #need to remove leading zeros 00
    #try except will only return the int on 2:6 if its a number
    try:   
        return int(plan_id[2:6])
    #otherwise return nan to be dropped
    except:
        return np.nan

In [9]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID
0,VW0008000039,active,Medicham
1,VW0008000039,active,Arcanine
2,VW0008000039,11/01/2018,Clefairy
3,VW0008000039,active,Zubat
4,VW0008000039,active,Nidoking
5,VW0008000039,active,Jigglypuff
6,VW0008000039,active,CharizardMega Charizard X
7,VW0008000039,active,Electrike
8,VW0008000039,active,Growlithe
9,VW0008000039,active,Fearow


In [10]:
partner['partner_program_id'] = 0

In [11]:
partner.partner_program_id = partner.PLANID.apply(lambda x: get_plan(x))

In [12]:
#need to drop the NAN columns
partner.dropna(inplace=True)

In [13]:
partner.partner_program_id.value_counts()

27    39
9     36
15    35
8     31
12    28
30    26
21    25
14    25
13    25
20    24
17    24
29    22
19    20
16    19
1     18
28    17
24    16
10    15
Name: partner_program_id, dtype: int64

In [14]:
partner.shape

(445, 4)

In [15]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
0,VW0008000039,active,Medicham,8
1,VW0008000039,active,Arcanine,8
2,VW0008000039,11/01/2018,Clefairy,8
3,VW0008000039,active,Zubat,8
4,VW0008000039,active,Nidoking,8
5,VW0008000039,active,Jigglypuff,8
6,VW0008000039,active,CharizardMega Charizard X,8
7,VW0008000039,active,Electrike,8
8,VW0008000039,active,Growlithe,8
9,VW0008000039,active,Fearow,8


# Step 3
Check if the funds match for each `program_id`.  In `partner.csv` the funds are in the `FUNDID` column and for `model_prop.csv` the funds are in the `symbol` column.  If there are any mismatches, return a list of which funds are missing from each database for each `model_id` in each `program_id`.

### Merge model and model_prop df on model_id column

In [16]:
vw_df = pd.merge(model, model_prop,  how='left', left_on=['model_id'], 
                     right_on = ['model_id'])

#### Note no model_ids were dropped using left join

In [17]:
model.model_id.value_counts().count()

187

In [18]:
model_prop.model_id.value_counts().count()

187

#### Looking at values (delete later)

In [30]:
model[model.model_id==28]

Unnamed: 0,model_id,program_id
0,28,3


In [29]:
model_prop[model_prop.model_id==28]

Unnamed: 0,model_props_id,model_id,symbol
13,119,28,Caterpie
14,120,28,Metapod
15,121,28,Butterfree
16,122,28,Weedle
17,123,28,Kakuna
26,124,28,Beedrill
27,125,28,BeedrillMega Beedrill
28,126,28,Pidgey
29,127,28,Pidgeotto
30,128,28,Pidgeot


In [32]:
vw_df.head(10)

Unnamed: 0,model_id,program_id,model_props_id,symbol
0,28,3,119,Caterpie
1,28,3,120,Metapod
2,28,3,121,Butterfree
3,28,3,122,Weedle
4,28,3,123,Kakuna
5,28,3,124,Beedrill
6,28,3,125,BeedrillMega Beedrill
7,28,3,126,Pidgey
8,28,3,127,Pidgeotto
9,28,3,128,Pidgeot


In [24]:
vw_df.shape

(729, 4)

In [25]:
vw_df.isnull().sum()

model_id          0
program_id        0
model_props_id    0
symbol            0
dtype: int64

### Now combine vw_df with partner

In [19]:
combined_df = pd.merge(partner, vw_df,  how='outer', left_on=['FUNDID', 'partner_program_id'], 
                     right_on = ['symbol','program_id'])

In [20]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
0,VW0008000039,active,Medicham,8.0,,,,
1,VW0008000039,active,Arcanine,8.0,65.0,8.0,311.0,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8.0,65.0,8.0,297.0,Clefairy
3,VW0008000039,active,Zubat,8.0,65.0,8.0,304.0,Zubat
4,VW0008000039,active,Nidoking,8.0,65.0,8.0,296.0,Nidoking
...,...,...,...,...,...,...,...,...
780,,,,,278.0,26.0,1962.0,Metapod
781,,,,,279.0,26.0,1963.0,Butterfree
782,,,,,280.0,26.0,1964.0,Weedle
783,,,,,281.0,26.0,1965.0,Kakuna


In [21]:
combined_df.shape

(785, 8)

In [22]:
combined_df.program_id.value_counts().sum()

729

In [23]:
combined_df.partner_program_id.value_counts().sum()

509

### Drop Columns with "Medicham" or "Electrike"

In [24]:
dropindex = combined_df[combined_df['FUNDID'] == 'Medicham' ].index

In [25]:
#delete later
dropindex

Int64Index([0, 76, 91, 145, 223, 272, 305, 331, 347, 396, 443, 482], dtype='int64')

In [26]:
combined_df.drop(dropindex, inplace=True)

In [27]:
dropindex = combined_df[combined_df['FUNDID'] == 'Electrike' ].index

In [28]:
#delete later
dropindex

Int64Index([  7,  61,  88, 114, 131, 152, 185, 220, 229, 262, 295, 320, 342,
            372, 408, 430, 450, 500],
           dtype='int64')

In [29]:
combined_df.drop(dropindex, inplace=True)

In [30]:
combined_df.fillna('missing', inplace=True)

In [31]:
pd.set_option('display.max_rows', 800)

In [50]:
pd.set_option('display.max_colwidth', None)

### Some Hints
* We value correct output over efficient code.  
* Does your code execute fully without errors?
* What edge cases have you considered?  How could you handle them?
* Could another engineer read your code and easily understand what's going on and why you did things a certain way?
* Most analysts don't use Python or Jupyter Notebooks.  How could you give them the output of your code?

You should find the following mismatches for `program_id` 1 and `model_id` 268:
* Funds missing at Vestwell: 
    
        'Hitmontop',
        'Smoochum',
        'Gastly',
        'Teddiursa',
        'Meowth',
        'Sneasel',
        'Xatu',
        'Growlithe',
        'Torchic',
        'ManectricMega Manectric',
        'Smeargle',
        'Stantler',
        'Tyrogue'
    
    
* Funds missing at Partner:
    
        'Persian', 'Psyduck', 'Rattata', 'Dugtrio'
    
    
* Closed funds:

        None

### New Approach

In [32]:
#can outer join the below 2 df

In [33]:
df1 =combined_df.groupby(['partner_program_id','program_id','model_id'])['FUNDID'].apply(','.join).reset_index()

In [34]:
df2 = combined_df.groupby(['partner_program_id','program_id','model_id'])['symbol'].apply(','.join).reset_index()

In [35]:
df2

Unnamed: 0,partner_program_id,program_id,model_id,symbol
0,1,1,260,"Diglett,Torchic"
1,1,1,261,Diglett
2,1,1,262,Diglett
3,1,1,263,Diglett
4,1,1,264,Diglett
5,1,1,265,Diglett
6,1,1,266,Diglett
7,1,1,267,Diglett
8,1,1,268,Diglett
9,1,1,269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Me..."


In [36]:
c_df = pd.merge(df1, df2,  how='outer',on=['partner_program_id','program_id','model_id'])

In [100]:
c_df

Unnamed: 0,partner_program_id,program_id,model_id,FUNDID,symbol
0,1,1,260,"Diglett,Torchic","Diglett,Torchic"
1,1,1,261,Diglett,Diglett
2,1,1,262,Diglett,Diglett
3,1,1,263,Diglett,Diglett
4,1,1,264,Diglett,Diglett
5,1,1,265,Diglett,Diglett
6,1,1,266,Diglett,Diglett
7,1,1,267,Diglett,Diglett
8,1,1,268,Diglett,Diglett
9,1,1,269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue","Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


In [None]:
funds_missing_at_partner = cdf.at[(rows where partner_program_id == 'missing'), 'symbol']


funds_missing_at_vw =
(partner_program_id ==1) & (program_id!='missing')  at FUNDID PLUS++++++++++

{(set{partner_program_id ==1) & (program_id != 'missing')}  at FUNDID} that are not in c_df.at[model_id.index, 'symbol']


In [38]:
# output_df_1 = c_df.groupby(['program_id','model_id']).size().reset_index()

In [51]:
# pd.merge(output_df_1, output_df_2, how='left', left_on=['program_id','model_id'], 
#                      right_on = ['program_id','model_id'])

Unnamed: 0,program_id,model_id,0,funds_missing_at_partner
0,1,259,1,"Paras,Phanpy,Venonat,Donphan,Porygon2"
1,1,260,2,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan"
2,1,261,2,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2"
3,1,262,2,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat"
4,1,263,2,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
5,1,264,2,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
6,1,265,2,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
7,1,266,2,"Persian,Psyduck,Rattata,Dugtrio,Paras"
8,1,267,2,"Persian,Psyduck,Rattata,Dugtrio,Paras"
9,1,268,2,"Persian,Psyduck,Rattata,Dugtrio"


### Make Output_df

#### Create frame for baseline for output_df

In [79]:
output_df_1 = c_df.groupby(['program_id','model_id']).size().reset_index()

#### Assign initial funds_missing_at_partner column (part 1)

In [80]:
idx = c_df.loc[c_df.partner_program_id =='missing'].index
output_df_2 = c_df.loc[idx, ['program_id', 'model_id', 'symbol']]
output_df_2.rename(columns={"symbol": "funds_missing_at_partner"}, inplace=True)

In [81]:
output_df_2

Unnamed: 0,program_id,model_id,funds_missing_at_partner
146,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2"
147,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan"
148,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2"
149,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat"
150,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
151,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
152,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
153,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras"
154,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras"
155,1,268,"Persian,Psyduck,Rattata,Dugtrio"


#### Join values to baseline output_df

In [82]:
output_df = pd.merge(output_df_1, output_df_2, how='left', left_on=['program_id','model_id'], 
                     right_on = ['program_id','model_id'])
output_df.drop(columns=[0], inplace=True)

In [83]:
output_df

Unnamed: 0,program_id,model_id,funds_missing_at_partner
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio"


#### Assign funds_missing_at_partner column (part 2) 
- edge case where partner_program_id db is completely missing a vw program_id

In [84]:
#gets list of program_id completely missing at partner
missing_program_id_partner = []
for i in c_df.program_id.unique():
    if i not in c_df.partner_program_id.unique():
        missing_program_id_partner.append(i)

In [85]:
missing_program_id_partner

[3.0, 4.0, 11.0, 22.0, 26.0]

In [86]:
# output_df[output_df.program_id == 3].funds_missing_at_partner.values

In [87]:
for miss_id in missing_program_id_partner:
    funds = []
    for item in output_df[output_df.program_id == miss_id].funds_missing_at_partner.values:
        try:
            funds.extend(item.split(','))
        except:
            funds.extend(item)
    #assign set of funds to all program_ids at current missing_id
    idx = output_df[output_df.program_id == miss_id].funds_missing_at_partner.index
    for index in idx:
        output_df.at[idx, 'funds_missing_at_partner'] = str(set(funds))

In [95]:
output_df[output_df.program_id == 3]#.funds_missing_at_partner

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw
11,3,17,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
12,3,18,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
13,3,19,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
14,3,20,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
15,3,21,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
16,3,22,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
17,3,23,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
18,3,24,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",
19,3,28,"{'BeedrillMega Beedrill', 'Butterfree', 'Pidgeot', 'PidgeotMega Pidgeot', 'Pidgeotto', 'Kakuna', 'Caterpie', 'Beedrill', 'Weedle', 'Pidgey', 'Metapod'}",


#### Create and assign funds_missing_at_vw column (part 1)

##### Initial assignment
get each (partner_program_id & FUNDID) where (program_id!='missing')

In [91]:
missing_vw = c_df.loc[c_df.loc[c_df.program_id =='missing'].index, 
                      ['partner_program_id', 'FUNDID']].set_index('partner_program_id')

In [97]:
#function gets a program_id and  if missing_vw.at[partner_program_id, 'FUNDID'] exists
# it assigns the missing FUNDIDS the that program_id
def pop_missing_at_vw_part_one(partner_program_id):
    #if this program_id has missing FUNDS
    try:
        return missing_vw.at[partner_program_id, 'FUNDID']
    #else keep it empty for now
    except:
        return np.nan

In [96]:
missing_vw

Unnamed: 0_level_0,FUNDID
partner_program_id,Unnamed: 1_level_1
1.0,"MedichamMega Medicham,Manectric,ManectricMega Manectric"
8.0,Carvanha
9.0,Sharpedo
15.0,"Charmander,SharpedoMega Sharpedo,CharizardMega Charizard Y,BlastoiseMega Blastoise,CharizardMega Charizard X,Wailmer"
24.0,Wailord
27.0,"Plusle,Sharpedo"
28.0,"Bulbasaur,Plusle"
29.0,"MedichamMega Medicham,Minun,Volbeat,Illumise,Venomoth,Roselia,Parasect,Golduck,Gulpin"
30.0,Numel


In [93]:
#assign the missing FUNDIDS to all program_ids
output_df['funds_missing_at_vw'] = output_df.program_id.apply(pop_missing_at_vw_part_one)

In [94]:
output_df

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric"


In [98]:
# idx = c_df.loc[c_df.program_id =='missing'].index

In [99]:
# c_df.loc[c_df.program_id =='missing']

##### Complete Population across all (part 2)

In [None]:
#do this
(set{partner_program_id ==1)&(program_id != 'missing')}  at FUNDID} that are not in c_df.at[model_id.index, 'symbol']
 


##### Create agg lists for program_id & partner_program_id pairs

In [142]:
remainder_groupby = c_df.replace({'missing': np.nan}).groupby(['partner_program_id',
                                                               'program_id'])['FUNDID'].apply(','.join).reset_index()


In [143]:
remainder_groupby.rename(columns={"FUNDID": "fund_agg"}, inplace=True)
remainder_groupby.fund_agg = remainder_groupby.fund_agg.apply(lambda x: set(x.split(',')))
remainder_groupby

Unnamed: 0,partner_program_id,program_id,fund_agg
0,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}"
1,8.0,8.0,"{Arbok, Clefairy, Pikachu, Nidoqueen, Nidorino, Growlithe, Zubat, CharizardMega Charizard X, Sandshrew, Clefable, Sandslash, Spearow, Fearow, Oddish, Raichu, Raticate, Vileplume, Golbat, Ninetales, Vulpix, Primeape, Wigglytuff, Nidoking, Jigglypuff, Arcanine, Gloom, Nidorina, Ekans}"
2,9.0,9.0,"{Tentacool, Venusaur, Abra, Graveler, Charmander, VenusaurMega Venusaur, Zubat, Machop, Slowbro, Doduo, Kadabra, Machoke, Geodude, AlakazamMega Alakazam, Magneton, Weepinbell, Golem, Alakazam, Poliwhirl, Squirtle, Ponyta, Poliwag, BlastoiseMega Blastoise, Victreebel, Tentacruel, Farfetch'd, Slowpoke, Poliwrath, Blastoise, Magnemite, SlowbroMega Slowbro, Bellsprout, Rapidash, Machamp}"
3,10.0,10.0,"{Squirtle, Venusaur, CharizardMega Charizard X, Ivysaur, Charmander, Blastoise, VenusaurMega Venusaur, Charizard, BlastoiseMega Blastoise, Charmeleon, CharizardMega Charizard Y, Wartortle, Bulbasaur}"
4,12.0,12.0,"{Charmander, VenusaurMega Venusaur, Charizard, Dewgong, Slowbro, Doduo, Shellder, CharizardMega Charizard X, Magneton, Golem, Squirtle, Ponyta, Muk, BlastoiseMega Blastoise, CharizardMega Charizard Y, Cloyster, Farfetch'd, Seel, Slowpoke, Grimer, Blastoise, Magnemite, Charmeleon, SlowbroMega Slowbro, Rapidash, Dodrio}"
5,13.0,13.0,"{Lickitung, Haunter, Slowbro, Doduo, CharizardMega Charizard X, Geodude, Weezing, Hitmonlee, Koffing, Rhyhorn, Magneton, Ponyta, Exeggutor, Marowak, Farfetch'd, Voltorb, Slowpoke, Cubone, Exeggcute, Magnemite, SlowbroMega Slowbro, Rapidash, Electrode, Hitmonchan}"
6,14.0,14.0,"{Mew, Zapdos, Charmander, Aerodactyl, Gastly, Quilava, Bulbasaur, Moltres, Snorlax, Articuno, MewtwoMega Mewtwo X, AerodactylMega Aerodactyl, Mewtwo, Bayleef, Cyndaquil, MewtwoMega Mewtwo Y, Magikarp, Meganium, Vulpix, Dratini, Dragonite, Dragonair, Chikorita}"
7,15.0,15.0,"{Arbok, Pikachu, Nidoqueen, Nidorino, Yanma, Politoed, Umbreon, Murkrow, Misdreavus, Sandshrew, Sunflora, Jumpluff, Sandslash, Quagsire, Spearow, Skiploom, Fearow, Wooper, Raichu, Raticate, Sunkern, Magikarp, Aipom, Espeon, Slowking, Nidorina, Hoppip, Ekans}"
8,16.0,16.0,"{Girafarig, Wobbuffet, Forretress, Goldeen, CharizardMega Charizard X, Snubbull, Pineco, Totodile, SteelixMega Steelix, BlastoiseMega Blastoise, Dunsparce, VenusaurMega Venusaur, Charmeleon, CharizardMega Charizard Y, Gastly, Unown, Gligar, Snorlax}"
9,17.0,17.0,"{Pichu, Mareep, Ariados, Crobat, Cleffa, Ampharos, Lanturn, Ledian, Marill, Xatu, Spinarak, Flaaffy, Natu, Bellossom, AmpharosMega Ampharos, Togepi, Azumarill, Sudowoodo, Togetic, Igglybuff, Chinchou, Chikorita}"


In [153]:
remainder_groupby=remainder_groupby.astype({'partner_program_id': 'object','program_id': 'object'})

In [154]:
# c_df.astype({'partner_program_id': 'float','program_id': 'float'})

In [155]:
remainder_groupby.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   partner_program_id  18 non-null     object
 1   program_id          18 non-null     object
 2   fund_agg            18 non-null     object
dtypes: object(3)
memory usage: 560.0+ bytes


In [156]:
c_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 217 entries, 0 to 216
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   partner_program_id  217 non-null    object
 1   program_id          217 non-null    object
 2   model_id            217 non-null    object
 3   FUNDID              217 non-null    object
 4   symbol              217 non-null    object
dtypes: object(5)
memory usage: 20.2+ KB


In [174]:
missing_df = pd.merge(remainder_groupby, c_df.replace({'missing': np.nan}),  how='outer',left_on=['program_id', 'partner_program_id'], 
                     right_on = ['program_id', 'partner_program_id'])
missing_df.drop(columns='FUNDID',inplace=True)
missing_df.dropna(inplace=True)

In [176]:
missing_df

Unnamed: 0,partner_program_id,program_id,fund_agg,model_id,symbol
0,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",260.0,"Diglett,Torchic"
1,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",261.0,Diglett
2,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",262.0,Diglett
3,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",263.0,Diglett
4,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",264.0,Diglett
5,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",265.0,Diglett
6,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",266.0,Diglett
7,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",267.0,Diglett
8,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",268.0,Diglett
9,1.0,1.0,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}",269.0,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


In [171]:
missing_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 217 entries, 0 to 216
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   partner_program_id  146 non-null    float64
 1   program_id          208 non-null    float64
 2   fund_agg            137 non-null    object 
 3   model_id            208 non-null    float64
 4   symbol              213 non-null    object 
dtypes: float64(3), object(2)
memory usage: 10.2+ KB


In [181]:
missing_df.fund_agg = missing_df.fund_agg.apply(lambda x: list(x))

In [170]:
# missing_df.fund_agg.value_counts()

TypeError: unhashable type: 'set'

Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas/_libs/hashtable_class_helper.pxi", line 1652, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'set'


{Venusaur, Arbok, Pikachu, Charmander, Nidoqueen, Nidorino, Growlithe, Yanma, Haunter, Sandshrew, Sandslash, Spearow, Fearow, Wooper, Raichu, Raticate, Treecko, Sceptile, Wigglytuff, Grovyle, Jigglypuff, Nidorina, Ekans}                                                                                                                                                                                        14
{Charmander, VenusaurMega Venusaur, Charizard, Dewgong, Slowbro, Doduo, Shellder, CharizardMega Charizard X, Magneton, Golem, Squirtle, Ponyta, Muk, BlastoiseMega Blastoise, CharizardMega Charizard Y, Cloyster, Farfetch'd, Seel, Slowpoke, Grimer, Blastoise, Magnemite, Charmeleon, SlowbroMega Slowbro, Rapidash, Dodrio}                                                                                     13
{Arbok, Clefairy, Pikachu, Nidoqueen, Nidorino, Growlithe, Zubat, CharizardMega Charizard X, Sandshrew, Clefable, Sandslash, Spearow, Fearow, Oddish, Raichu, Raticate, Vileplume, Golbat,

In [198]:
def what_is_missing(fund_agg, symbol):
    my_list = [symbol]
    try:
        my_list = symbol.split(',')
    except:
        my_list = [symbol]
    for each in my_list:
        if each in fund_agg:
            fund_agg.remove(each)
            print(each, 'made it into if')
#         print(each, ' didnt make it')
    return fund_agg

In [200]:
missing_df['missing']= missing_df.apply(lambda x: what_is_missing(x.fund_agg, x.symbol), axis=1)

# df['col_3'] = df.apply(lambda x: function(x.col_1, x.col_2), axis=1)

In [213]:
missing_df

Unnamed: 0,partner_program_id,program_id,fund_agg,model_id,symbol,missing
0,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",260,"Diglett,Torchic","[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
1,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",261,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
2,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",262,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
3,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",263,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
4,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",264,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
5,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",265,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
6,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",266,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
7,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",267,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
8,1.0,1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",268,Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
9,1.0,1.0,[Torchic],269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue",[Torchic]


#### Now combine with output df

In [217]:
missing_df = missing_df.astype({'model_id': 'object', 'program_id': 'object'})

In [218]:
missing_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 137 entries, 0 to 136
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   partner_program_id  137 non-null    float64
 1   program_id          137 non-null    object 
 2   fund_agg            137 non-null    object 
 3   model_id            137 non-null    object 
 4   symbol              137 non-null    object 
 5   missing             137 non-null    object 
dtypes: float64(1), object(5)
memory usage: 7.5+ KB


In [219]:
output_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 188 entries, 0 to 187
Data columns (total 4 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   program_id                188 non-null    object
 1   model_id                  188 non-null    object
 2   funds_missing_at_partner  71 non-null     object
 3   funds_missing_at_vw       85 non-null     object
dtypes: object(4)
memory usage: 12.3+ KB


In [221]:
output_final = pd.merge(output_df, missing_df,  how='outer',left_on=['program_id', 'model_id'], 
                     right_on = ['program_id', 'model_id'])

In [222]:
output_final 

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw,partner_program_id,fund_agg,symbol,missing
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",,,,
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]","Diglett,Torchic","[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"


In [238]:
# remainder_groupby.set_index('partner_program_id', inplace=True)
remainder_groupby

Unnamed: 0_level_0,program_id,fund_agg
partner_program_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,1,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}"
8.0,8,"{Arbok, Clefairy, Pikachu, Nidoqueen, Nidorino, Growlithe, Zubat, CharizardMega Charizard X, Sandshrew, Clefable, Sandslash, Spearow, Fearow, Oddish, Raichu, Raticate, Vileplume, Golbat, Ninetales, Vulpix, Primeape, Wigglytuff, Nidoking, Jigglypuff, Arcanine, Gloom, Nidorina, Ekans}"
9.0,9,"{Tentacool, Venusaur, Abra, Graveler, Charmander, VenusaurMega Venusaur, Zubat, Machop, Slowbro, Doduo, Kadabra, Machoke, Geodude, AlakazamMega Alakazam, Magneton, Weepinbell, Golem, Alakazam, Poliwhirl, Squirtle, Ponyta, Poliwag, BlastoiseMega Blastoise, Victreebel, Tentacruel, Farfetch'd, Slowpoke, Poliwrath, Blastoise, Magnemite, SlowbroMega Slowbro, Bellsprout, Rapidash, Machamp}"
10.0,10,"{Squirtle, Venusaur, CharizardMega Charizard X, Ivysaur, Charmander, Blastoise, VenusaurMega Venusaur, Charizard, BlastoiseMega Blastoise, Charmeleon, CharizardMega Charizard Y, Wartortle, Bulbasaur}"
12.0,12,"{Charmander, VenusaurMega Venusaur, Charizard, Dewgong, Slowbro, Doduo, Shellder, CharizardMega Charizard X, Magneton, Golem, Squirtle, Ponyta, Muk, BlastoiseMega Blastoise, CharizardMega Charizard Y, Cloyster, Farfetch'd, Seel, Slowpoke, Grimer, Blastoise, Magnemite, Charmeleon, SlowbroMega Slowbro, Rapidash, Dodrio}"
13.0,13,"{Lickitung, Haunter, Slowbro, Doduo, CharizardMega Charizard X, Geodude, Weezing, Hitmonlee, Koffing, Rhyhorn, Magneton, Ponyta, Exeggutor, Marowak, Farfetch'd, Voltorb, Slowpoke, Cubone, Exeggcute, Magnemite, SlowbroMega Slowbro, Rapidash, Electrode, Hitmonchan}"
14.0,14,"{Mew, Zapdos, Charmander, Aerodactyl, Gastly, Quilava, Bulbasaur, Moltres, Snorlax, Articuno, MewtwoMega Mewtwo X, AerodactylMega Aerodactyl, Mewtwo, Bayleef, Cyndaquil, MewtwoMega Mewtwo Y, Magikarp, Meganium, Vulpix, Dratini, Dragonite, Dragonair, Chikorita}"
15.0,15,"{Arbok, Pikachu, Nidoqueen, Nidorino, Yanma, Politoed, Umbreon, Murkrow, Misdreavus, Sandshrew, Sunflora, Jumpluff, Sandslash, Quagsire, Spearow, Skiploom, Fearow, Wooper, Raichu, Raticate, Sunkern, Magikarp, Aipom, Espeon, Slowking, Nidorina, Hoppip, Ekans}"
16.0,16,"{Girafarig, Wobbuffet, Forretress, Goldeen, CharizardMega Charizard X, Snubbull, Pineco, Totodile, SteelixMega Steelix, BlastoiseMega Blastoise, Dunsparce, VenusaurMega Venusaur, Charmeleon, CharizardMega Charizard Y, Gastly, Unown, Gligar, Snorlax}"
17.0,17,"{Pichu, Mareep, Ariados, Crobat, Cleffa, Ampharos, Lanturn, Ledian, Marill, Xatu, Spinarak, Flaaffy, Natu, Bellossom, AmpharosMega Ampharos, Togepi, Azumarill, Sudowoodo, Togetic, Igglybuff, Chinchou, Chikorita}"


In [241]:
#populate NaN in the missing col
def get_missing_missing(partner_program_id, missing):
    try:
        if math.isnan(missing):
            return remainder_groupby.at[partner_program_id, 'fund_agg']
    except:
        return missing

In [242]:
output_final.missing = output_final.apply(lambda x: get_missing_missing(x.program_id, x.missing), axis=1)

In [244]:
output_final.isna().sum()

program_id                    0
model_id                      0
funds_missing_at_partner    117
funds_missing_at_vw         103
partner_program_id           51
fund_agg                     51
symbol                       51
missing                      49
dtype: int64

In [245]:
output_final

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw,partner_program_id,fund_agg,symbol,missing
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",,,,"{Diglett, Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu}"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]","Diglett,Torchic","[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega Manectric",1.0,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]",Diglett,"[Stantler, Sneasel, Tyrogue, Teddiursa, Smoochum, Torchic, Smeargle, Hitmontop, Gastly, Growlithe, Meowth, Xatu]"


#### Add missing column to funds missing at vw and delete other columns


In [259]:
def combine_vw_missing_and_missing_col(funds_missing_at_vw, missing):
    try:
        if math.isnan(missing):
            return funds_missing_at_vw
    except:
        try:
            if math.isnan(funds_missing_at_vw):
                return ','.join(missing)
        except:
                return funds_missing_at_vw + ','.join(missing)

In [260]:
output_final.funds_missing_at_vw = output_final.apply(lambda x: combine_vw_missing_and_missing_col(x.funds_missing_at_vw, x.missing), axis=1)


In [262]:
output_final.columns

Index(['program_id', 'model_id', 'funds_missing_at_partner',
       'funds_missing_at_vw', 'partner_program_id', 'fund_agg', 'symbol',
       'missing'],
      dtype='object')

In [264]:
output_final.loc[:,['program_id', 'model_id', 'funds_missing_at_partner',
       'funds_missing_at_vw']]

Unnamed: 0,program_id,model_id,funds_missing_at_partner,funds_missing_at_vw
0,1,259,"Paras,Phanpy,Venonat,Donphan,Porygon2","MedichamMega Medicham,Manectric,ManectricMega ManectricDiglett,Stantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
1,1,260,"Psyduck,Rattata,Paras,Phanpy,Venonat,Donphan","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
2,1,261,"Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat,Porygon2","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
3,1,262,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy,Venonat","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
4,1,263,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
5,1,264,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
6,1,265,"Persian,Psyduck,Rattata,Dugtrio,Paras,Phanpy","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
7,1,266,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
8,1,267,"Persian,Psyduck,Rattata,Dugtrio,Paras","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"
9,1,268,"Persian,Psyduck,Rattata,Dugtrio","MedichamMega Medicham,Manectric,ManectricMega ManectricStantler,Sneasel,Tyrogue,Teddiursa,Smoochum,Torchic,Smeargle,Hitmontop,Gastly,Growlithe,Meowth,Xatu"


In [283]:
c_df[c_df.program_id=='missing']

Unnamed: 0,partner_program_id,program_id,model_id,FUNDID,symbol
10,1,missing,missing,"MedichamMega Medicham,Manectric,ManectricMega Manectric","missing,missing,missing"
24,8,missing,missing,Carvanha,missing
35,9,missing,missing,Sharpedo,missing
74,15,missing,missing,"Charmander,SharpedoMega Sharpedo,CharizardMega Charizard Y,BlastoiseMega Blastoise,CharizardMega Charizard X,Wailmer","missing,missing,missing,missing,missing,missing"
105,24,missing,missing,Wailord,missing
117,27,missing,missing,"Plusle,Sharpedo","missing,missing"
119,28,missing,missing,"Bulbasaur,Plusle","missing,missing"
130,29,missing,missing,"MedichamMega Medicham,Minun,Volbeat,Illumise,Venomoth,Roselia,Parasect,Golduck,Gulpin","missing,missing,missing,missing,missing,missing,missing,missing,missing"
145,30,missing,missing,Numel,missing


In [253]:
partner[partner.partner_program_id==1]

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
427,VWPALL000076,active,Medicham,1
428,VWPALL000076,active,Hitmontop,1
429,VWPALL000076,active,Smoochum,1
430,VWPALL000076,active,Diglett,1
431,VWPALL000076,01/15/2019,MedichamMega Medicham,1
432,VWPALL000076,active,Gastly,1
433,VWPALL000076,active,Teddiursa,1
434,VWPALL000076,active,Meowth,1
435,VWPALL000076,active,Sneasel,1
436,VWPALL000076,active,Electrike,1


In [249]:
model

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
0,VW0008000039,active,Medicham,8
1,VW0008000039,active,Arcanine,8
2,VW0008000039,11/01/2018,Clefairy,8
3,VW0008000039,active,Zubat,8
4,VW0008000039,active,Nidoking,8
5,VW0008000039,active,Jigglypuff,8
6,VW0008000039,active,CharizardMega Charizard X,8
7,VW0008000039,active,Electrike,8
8,VW0008000039,active,Growlithe,8
9,VW0008000039,active,Fearow,8


In [259]:
model_prop[model_prop.model_id==268]

Unnamed: 0,model_props_id,model_id,symbol
523,1899,268,Persian
524,1897,268,Diglett
525,1900,268,Psyduck
526,1896,268,Rattata
527,1898,268,Dugtrio


In [260]:
model[model.model_id==268]

Unnamed: 0,model_id,program_id
14,268,1


In [222]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
1,VW0008000039,active,Arcanine,8,65,8,311,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8,65,8,297,Clefairy
3,VW0008000039,active,Zubat,8,65,8,304,Zubat
4,VW0008000039,active,Nidoking,8,65,8,296,Nidoking
5,VW0008000039,active,Jigglypuff,8,65,8,302,Jigglypuff
6,VW0008000039,active,CharizardMega Charizard X,8,65,8,299,CharizardMega Charizard X
8,VW0008000039,active,Growlithe,8,65,8,310,Growlithe
9,VW0008000039,active,Fearow,8,55,8,286,Fearow
10,VW0008000039,active,Ekans,8,56,8,287,Ekans
11,VW0008000039,active,Arbok,8,57,8,288,Arbok


## Step 4 - Check for any closed funds
Check each `model_id` in each `program_id` in Vestwell's data to see if our partner has indicated a fund has closed.  We don't care about funds that have closed that aren't in Vestwell's data.  Add this information to your output from step 3.