# Vestwell Python Screener

If you have any questions or feel you are making assumptions, please record them in this notebook or in comments if you'd rather work in a `.py` file.  If you get stuck, try to explain in words how you would complete the exercise.

### Background

Vestwell provides a wide variety of investment choices to its users.  Participants in a retirement plan can choose between a pre-determined set of funds or they can choose their own custom set of funds from a list of choices.  Advisors can create their own models with a custom set of funds in which participants can choose to invest.  As a result, there are thousands of unique models on the Vestwell platform.  

One of Vestwell's partners has the same list of models in its database.  This partner will maintain an up-to-date list of funds for each model in their database.  For example, when a fund closes and is replaced by a new one, Vestwell's partner will update the model with the new fund in their database, but not in Vestwell's.  For this reason, Vestwell's database and our partner's database will get out of sync over time.  Unless, that is, you can build python script to reconcile the two databases.  We're rooting for you!

### The Data

Here's a high-level overview of the data.  We'll get into more details below as we dig in.

**Vestwell Data**

Each `program_id` has many `model_id`s.  Each `model_id` has many `symbol`s.

* model.csv:  Associations of programs to models.
* model_prop.csv:  Association of models to symbols.

**Partner Data**

Each `PLANID` has many `FUNDID`s.

* partner.csv:  Association of `PLANID` to `FUNDID` and `PLANINVCLOSEDATE`.  

**Some extra notes**

* The `FUNDID` in our partner's data is equivalent to `symbol` in Vestwell's data.  These are also referred to as "funds".
* The `PLANID` in our partner's data has information that is equivalent to the `program_id` in Vestwell's database (more details below in Step 2).
* The `PLANINVCLOSEDATE` in our partner's database is the date when a fund was closed.  If there isn't a date, then the fund has not been closed.
* Sometimes our partner has funds called either "Medicham" or "Electrike" which we ignore.

### Goal
The goal of this exercise is to compare Vestwell's data with our partner's data.  We want to figure out if Vestwell's model data is the same as our Partner's data.  We consider our partner's database the source of truth since their database will remain updated if there are any changes to funds.  Here's specifically what we are asking:

1.  Do the list of funds for each `model_id` in each `program_id` in Vestwell's database match the list of funds in our partner's `program_id`?  If there are any mismatches, what funds are missing from each database?  Remember, our partner doesn't use `model_id` so whichever funds they have for a particular `program_id` should match the list of funds that Vestwell has for any `model_id` that uses that `program_id`.

For example, if Vestwell's database has funds A, B and C for a `model_id` in a particular `program_id` and our partner's database has funds B, C, and D for the same `program_id` we would report that fund A is missing from our partner's database and that fund D is missing from our database for that `model_id` and `program_id`.

2.  Are there any funds in Vestwell's database that have closed?  If so, what are they for each `model_id` in each `program_id`?

For example, if our database has funds D, E, and F for a certain `model_id` in a `program_id` and our partner's database shows that fund D closed on 11/1/2019, then we would report that fund D has closed for that `model_id` and `program_id`.

Ideally, the output is in a form that can be passed to a Business Analyst to take action on.  For example, the output could look something like this:

| program_id | model_id | fund_missing_at_vw | fund_missing_at_partner | fund_closed |
|------------|--|---------------|--------------------|--------|
| 1          |1| None          | None               | None   |
| 1          |2| D, Z             | A                  | F   |
| 2          |2| None             | None               | F      |

### Some Hints
* We value correct output over efficient code.  
* Does your code execute fully without errors?
* What edge cases have you considered?  How could you handle them?
* Could another engineer read your code and easily understand what's going on and why you did things a certain way?
* Most analysts don't use Python or Jupyter Notebooks.  How could you give them the output of your code?

You should find the following mismatches for `program_id` 1 and `model_id` 268:
* Funds missing at Vestwell: 
    
        'Hitmontop',
        'Smoochum',
        'Gastly',
        'Teddiursa',
        'Meowth',
        'Sneasel',
        'Xatu',
        'Growlithe',
        'Torchic',
        'ManectricMega Manectric',
        'Smeargle',
        'Stantler',
        'Tyrogue'
    
    
* Funds missing at Partner:
    
        'Persian', 'Psyduck', 'Rattata', 'Dugtrio'
    
    
* Closed funds:

        None

## Step 0
Import any packages you'll need

In [1]:
import pandas as pd
import numpy as np
import math

## Step 1
Import `partner.csv`, `model.csv`, and `model_prop.csv`.

In [2]:
model = pd.read_csv('data_V2/model.csv')
model.head()

Unnamed: 0,model_id,program_id
0,28,3
1,34,4
2,42,4
3,24,3
4,64,8


In [5]:
model.shape

(187, 2)

In [4]:
model_prop = pd.read_csv('data_V2/model_props.csv')
model_prop.head()

Unnamed: 0,model_props_id,model_id,symbol
0,541,80,Bulbasaur
1,542,80,Ivysaur
2,543,80,Venusaur
3,544,80,VenusaurMega Venusaur
4,545,80,Charmander


In [6]:
model_prop.shape

(729, 3)

In [3]:
partner = pd.read_csv('data_V2/partner.csv')
partner.head()

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID
0,VW0008000039,active,Medicham
1,VW0008000039,active,Arcanine
2,VW0008000039,11/01/2018,Clefairy
3,VW0008000039,active,Zubat
4,VW0008000039,active,Nidoking


In [7]:
partner.shape

(445, 3)

## Step 2 - working with the `partner.csv` data
Extract the `program_id` from the `PLANID` column in the `partner` dataframe.  The `program_id` is the first four characters in `PLANID` after "VW".  It's usually an integer.  If instead of digits, those characters are equal to "PALL" then the `program_id` = 1.  Drop any other rows remaining that do not have four digits in the first four characters after "VW" in the `PLANID` column.  For example, if a row in `PLANID` has `VWLASP000` then it should be dropped because it has `LASP` after `VW` instead of four digits.

In [8]:
#extracts plann number for partner.PLANID col, will return NA for any non 'PALL' of numeric value
def get_plan(plan_id):
    if plan_id[2:6]=='PALL':
        return 1
    #need to remove leading zeros 00
    #try except will only return the int on 2:6 if its a number
    try:   
        return int(plan_id[2:6])
    #otherwise return nan to be dropped
    except:
        return np.nan

In [11]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
0,VW0008000039,active,Medicham,0
1,VW0008000039,active,Arcanine,0
2,VW0008000039,11/01/2018,Clefairy,0
3,VW0008000039,active,Zubat,0
4,VW0008000039,active,Nidoking,0
5,VW0008000039,active,Jigglypuff,0
6,VW0008000039,active,CharizardMega Charizard X,0
7,VW0008000039,active,Electrike,0
8,VW0008000039,active,Growlithe,0
9,VW0008000039,active,Fearow,0


In [12]:
partner['partner_program_id'] = 0

In [13]:
partner.partner_program_id = partner.PLANID.apply(lambda x: get_plan(x))

In [18]:
#need to drop the NAN columns
partner.dropna(inplace=True)

In [21]:
partner.partner_program_id.value_counts()

27    39
9     36
15    35
8     31
12    28
30    26
21    25
14    25
13    25
20    24
17    24
29    22
19    20
16    19
1     18
28    17
24    16
10    15
Name: partner_program_id, dtype: int64

In [19]:
partner.shape

(445, 4)

In [20]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
0,VW0008000039,active,Medicham,8
1,VW0008000039,active,Arcanine,8
2,VW0008000039,11/01/2018,Clefairy,8
3,VW0008000039,active,Zubat,8
4,VW0008000039,active,Nidoking,8
5,VW0008000039,active,Jigglypuff,8
6,VW0008000039,active,CharizardMega Charizard X,8
7,VW0008000039,active,Electrike,8
8,VW0008000039,active,Growlithe,8
9,VW0008000039,active,Fearow,8


# Step 3
Check if the funds match for each `program_id`.  In `partner.csv` the funds are in the `FUNDID` column and for `model_prop.csv` the funds are in the `symbol` column.  If there are any mismatches, return a list of which funds are missing from each database for each `model_id` in each `program_id`.

### Merge model and model_prop df on model_id column

In [22]:
vw_df = pd.merge(model, model_prop,  how='left', left_on=['model_id'], 
                     right_on = ['model_id'])

#### Note no model_ids were dropped using left join

In [36]:
model.model_id.value_counts().count()

187

In [34]:
model_prop.model_id.value_counts().count()

187

#### Looking at values (delete later)

In [30]:
model[model.model_id==28]

Unnamed: 0,model_id,program_id
0,28,3


In [29]:
model_prop[model_prop.model_id==28]

Unnamed: 0,model_props_id,model_id,symbol
13,119,28,Caterpie
14,120,28,Metapod
15,121,28,Butterfree
16,122,28,Weedle
17,123,28,Kakuna
26,124,28,Beedrill
27,125,28,BeedrillMega Beedrill
28,126,28,Pidgey
29,127,28,Pidgeotto
30,128,28,Pidgeot


In [32]:
vw_df.head(10)

Unnamed: 0,model_id,program_id,model_props_id,symbol
0,28,3,119,Caterpie
1,28,3,120,Metapod
2,28,3,121,Butterfree
3,28,3,122,Weedle
4,28,3,123,Kakuna
5,28,3,124,Beedrill
6,28,3,125,BeedrillMega Beedrill
7,28,3,126,Pidgey
8,28,3,127,Pidgeotto
9,28,3,128,Pidgeot


In [24]:
vw_df.shape

(729, 4)

In [25]:
vw_df.isnull().sum()

model_id          0
program_id        0
model_props_id    0
symbol            0
dtype: int64

### Now combine vw_df with partner

In [37]:
combined_df = pd.merge(partner, vw_df,  how='outer', left_on=['FUNDID', 'partner_program_id'], 
                     right_on = ['symbol','program_id'])

In [38]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
0,VW0008000039,active,Medicham,8.0,,,,
1,VW0008000039,active,Arcanine,8.0,65.0,8.0,311.0,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8.0,65.0,8.0,297.0,Clefairy
3,VW0008000039,active,Zubat,8.0,65.0,8.0,304.0,Zubat
4,VW0008000039,active,Nidoking,8.0,65.0,8.0,296.0,Nidoking
...,...,...,...,...,...,...,...,...
780,,,,,278.0,26.0,1962.0,Metapod
781,,,,,279.0,26.0,1963.0,Butterfree
782,,,,,280.0,26.0,1964.0,Weedle
783,,,,,281.0,26.0,1965.0,Kakuna


In [39]:
combined_df.shape

(785, 8)

In [40]:
combined_df.program_id.value_counts().sum()

729

In [41]:
combined_df.partner_program_id.value_counts().sum()

509

### Drop Columns with "Medicham" or "Electrike"

In [42]:
dropindex = combined_df[combined_df['FUNDID'] == 'Medicham' ].index

In [43]:
#delete later
dropindex

Int64Index([0, 76, 91, 145, 223, 272, 305, 331, 347, 396, 443, 482], dtype='int64')

In [44]:
combined_df.drop(dropindex, inplace=True)

In [45]:
dropindex = combined_df[combined_df['FUNDID'] == 'Electrike' ].index

In [46]:
#delete later
dropindex

Int64Index([  7,  61,  88, 114, 131, 152, 185, 220, 229, 262, 295, 320, 342,
            372, 408, 430, 450, 500],
           dtype='int64')

In [47]:
combined_df.drop(dropindex, inplace=True)

### Looking at combined_df table

In [91]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
1,VW0008000039,active,Arcanine,8,65,8,311,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8,65,8,297,Clefairy
3,VW0008000039,active,Zubat,8,65,8,304,Zubat
4,VW0008000039,active,Nidoking,8,65,8,296,Nidoking
5,VW0008000039,active,Jigglypuff,8,65,8,302,Jigglypuff
6,VW0008000039,active,CharizardMega Charizard X,8,65,8,299,CharizardMega Charizard X
8,VW0008000039,active,Growlithe,8,65,8,310,Growlithe
9,VW0008000039,active,Fearow,8,55,8,286,Fearow
10,VW0008000039,active,Ekans,8,56,8,287,Ekans
11,VW0008000039,active,Arbok,8,57,8,288,Arbok


In [49]:
combined_df.columns

Index(['PLANID', 'PLANINVCLOSEDATE', 'FUNDID', 'partner_program_id',
       'model_id', 'program_id', 'model_props_id', 'symbol'],
      dtype='object')

In [78]:
combined_df.fillna('missing', inplace=True)

In [52]:
df_3 = combined_df[['FUNDID', 'partner_program_id','symbol', 'program_id', 'model_id', 'model_props_id']]

In [61]:
pd.set_option('display.max_rows', 800)

In [62]:
df_3.shape

(755, 6)

In [63]:
df_3.head(700)

Unnamed: 0,FUNDID,partner_program_id,symbol,program_id,model_id,model_props_id
1,Arcanine,8.0,Arcanine,8.0,65.0,311.0
2,Clefairy,8.0,Clefairy,8.0,65.0,297.0
3,Zubat,8.0,Zubat,8.0,65.0,304.0
4,Nidoking,8.0,Nidoking,8.0,65.0,296.0
5,Jigglypuff,8.0,Jigglypuff,8.0,65.0,302.0
6,CharizardMega Charizard X,8.0,CharizardMega Charizard X,8.0,65.0,299.0
8,Growlithe,8.0,Growlithe,8.0,65.0,310.0
9,Fearow,8.0,Fearow,8.0,55.0,286.0
10,Ekans,8.0,Ekans,8.0,56.0,287.0
11,Arbok,8.0,Arbok,8.0,57.0,288.0


### Try groupby

In [73]:
df_3.columns

Index(['FUNDID', 'partner_program_id', 'symbol', 'program_id', 'model_id',
       'model_props_id'],
      dtype='object')

In [79]:
df_3.groupby(['FUNDID', 'partner_program_id', 'symbol', 'program_id', 'model_id']).size().reset_index()

Unnamed: 0,FUNDID,partner_program_id,symbol,program_id,model_id,0
0,Abra,9,Abra,9,75,1
1,Aerodactyl,14,Aerodactyl,14,113,1
2,AerodactylMega Aerodactyl,14,AerodactylMega Aerodactyl,14,113,1
3,Aipom,15,Aipom,15,163,1
4,Alakazam,9,Alakazam,9,75,1
5,AlakazamMega Alakazam,9,AlakazamMega Alakazam,9,75,1
6,Ampharos,17,Ampharos,17,183,1
7,AmpharosMega Ampharos,17,AmpharosMega Ampharos,17,183,1
8,Arbok,8,Arbok,8,57,1
9,Arbok,15,Arbok,15,155,1


In [83]:
df_3.groupby(['FUNDID', 'partner_program_id', 'model_id']).size().reset_index()

Unnamed: 0,FUNDID,partner_program_id,model_id,0
0,Abra,9,75,1
1,Aerodactyl,14,113,1
2,AerodactylMega Aerodactyl,14,113,1
3,Aipom,15,163,1
4,Alakazam,9,75,1
5,AlakazamMega Alakazam,9,75,1
6,Ampharos,17,183,1
7,AmpharosMega Ampharos,17,183,1
8,Arbok,8,57,1
9,Arbok,15,155,1


### Create output_df

In [229]:
#note - If partner.program_id not in VW db then we ignore missing fund.
output_df = df_3.groupby(['program_id','model_id']).size().reset_index()

In [230]:
output_df.drop(columns=0, inplace=True)
output_df['fund_missing_at_vw'] = 'None'
output_df['fund_missing_at_partner'] = 'None' 
#drop the last row with missing values
drop_idx = output_df[(output_df.program_id=='missing')&(output_df.model_id=='missing')].index
output_df.drop(drop_idx, inplace=True)     #drop the last row with missing values

# output_df.set_index('model_id', inplace=True)
output_df

Unnamed: 0,program_id,model_id,fund_missing_at_vw,fund_missing_at_partner
0,1,259,,
1,1,260,,
2,1,261,,
3,1,262,,
4,1,263,,
5,1,264,,
6,1,265,,
7,1,266,,
8,1,267,,
9,1,268,,


In [231]:
# output_df.loc[(output_df.program_id==1)].index

In [232]:
# output_df.loc[(output_df.program_id==1)&(output_df.model_id==259)]

In [233]:
# output_df.at[output_df.loc[(output_df.program_id==1)&(output_df.model_id==259)].index, 'fund_missing_at_vw']= 'ntt;'

In [234]:
# output_df.at[output_df.loc[(output_df.program_id==1)&(output_df.model_id==259)].index, 'fund_missing_at_vw']= append_funds('bob', 'builder')

#### Now iterate through each row of combined_df to find mismatched funds

In [235]:
#append my funds fxn
def append_funds(current, new):
#     return current.append(new)
    return current + ','+ new

In [236]:
# append_funds('bob', 'builder')

In [237]:
# def populate_missing_fund_at_partner(model_id, program_id, missing_fund): #,db):
#     #find index for current model_id & program_id match
#     idx = output_df.loc[(output_df.program_id==program_id)&(output_df.model_id==model_id)].index
#     if output_df.at[idx, 'fund_missing_at_partner'] == 'None':
#         #if empty set to missing_fund
#         output_df.at[idx, 'fund_missing_at_partner'] = [missing_fund]
#     else:
#         #append missing_fund to current list
#         current_fund = output_df.at[idx, 'fund_missing_at_partner']
#         output_df.at[idx, 'fund_missing_at_partner']= append_funds(current_fund, missing_fund)
    

In [238]:
# def populate_missing_fund_at_vw(model_id, program_id, missing_fund): #,db):
#     #find indicies for all program_id matches
#     idx = output_df.loc[(output_df.program_id==program_id)].index
    
#     for i in idx:  
#         if output_df.at[i, 'fund_missing_at_vw'] == 'None':
#             #if empty set to missing_fund
#             output_df.at[i, 'fund_missing_at_vw'] = [missing_fund]
#         else:
#             #append missing_fund to current list
#             current_fund = output_df.at[i, 'fund_missing_at_vw']
#             output_df.at[i, 'fund_missing_at_vw']= append_funds(current_fund, missing_fund)

In [239]:
# combined function
def populate_missing_funds(model_id, program_id, missing_fund, db):
    #find indicies for all program_id matches
    if db == 'fund_missing_at_vw':
        idx = output_df.loc[(output_df.program_id==program_id)].index
    else:
        idx = output_df.loc[(output_df.program_id==program_id)&(output_df.model_id==model_id)].index
    
    for i in idx:  
        if output_df.at[i, db] == 'None':
            #if empty set to missing_fund
            output_df.at[i, db] = [missing_fund]
        else:
            output_df.at[i, db].append(missing_fund)
            #append missing_fund to current list
#             pass
#             current_fund = output_df.at[i, db]
#             output_df.at[i, db]= append_funds(current_fund, missing_fund)
            

In [240]:
for index, row in combined_df.iterrows():
    if row['partner_program_id'] =='missing':
        #call populate output_df fund missing at partner
        populate_missing_funds(row['model_id'], row['program_id'], row['symbol'], 'fund_missing_at_partner')
        
    if row['program_id'] =='missing':
        #call populate output_df fund missing at vw
        populate_missing_funds(row['model_id'], row['partner_program_id'], row['FUNDID'], 'fund_missing_at_vw')


In [242]:
pd.set_option('display.max_colwidth', -1)

  """Entry point for launching an IPython kernel.


In [243]:
output_df

Unnamed: 0,program_id,model_id,fund_missing_at_vw,fund_missing_at_partner
0,1,259,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Paras, Phanpy, Venonat, Donphan, Porygon2]"
1,1,260,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Psyduck, Rattata, Paras, Phanpy, Venonat, Donphan]"
2,1,261,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Psyduck, Rattata, Dugtrio, Paras, Phanpy, Venonat, Porygon2]"
3,1,262,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Persian, Psyduck, Rattata, Dugtrio, Paras, Phanpy, Venonat]"
4,1,263,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Persian, Psyduck, Rattata, Dugtrio, Paras, Phanpy]"
5,1,264,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Persian, Psyduck, Rattata, Dugtrio, Paras, Phanpy]"
6,1,265,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Persian, Psyduck, Rattata, Dugtrio, Paras, Phanpy]"
7,1,266,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Persian, Psyduck, Rattata, Dugtrio, Paras]"
8,1,267,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Persian, Psyduck, Rattata, Dugtrio, Paras]"
9,1,268,"[MedichamMega Medicham, Manectric, ManectricMega Manectric]","[Persian, Psyduck, Rattata, Dugtrio]"


### Some Hints
* We value correct output over efficient code.  
* Does your code execute fully without errors?
* What edge cases have you considered?  How could you handle them?
* Could another engineer read your code and easily understand what's going on and why you did things a certain way?
* Most analysts don't use Python or Jupyter Notebooks.  How could you give them the output of your code?

You should find the following mismatches for `program_id` 1 and `model_id` 268:
* Funds missing at Vestwell: 
    
        'Hitmontop',
        'Smoochum',
        'Gastly',
        'Teddiursa',
        'Meowth',
        'Sneasel',
        'Xatu',
        'Growlithe',
        'Torchic',
        'ManectricMega Manectric',
        'Smeargle',
        'Stantler',
        'Tyrogue'
    
    
* Funds missing at Partner:
    
        'Persian', 'Psyduck', 'Rattata', 'Dugtrio'
    
    
* Closed funds:

        None

In [255]:
combined_df[(combined_df.partner_program_id==1)]#&(combined_df.model_id=='missing')]

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
483,VWPALL000076,active,Hitmontop,1,269,1,1906,Hitmontop
484,VWPALL000076,active,Smoochum,1,269,1,1910,Smoochum
485,VWPALL000076,active,Diglett,1,268,1,1897,Diglett
486,VWPALL000076,active,Diglett,1,269,1,1907,Diglett
487,VWPALL000076,active,Diglett,1,260,1,1841,Diglett
488,VWPALL000076,active,Diglett,1,261,1,1848,Diglett
489,VWPALL000076,active,Diglett,1,262,1,1856,Diglett
490,VWPALL000076,active,Diglett,1,263,1,1864,Diglett
491,VWPALL000076,active,Diglett,1,264,1,1871,Diglett
492,VWPALL000076,active,Diglett,1,265,1,1878,Diglett


In [268]:
combined_df.groupby(['partner_program_id','program_id','model_id'])['FUNDID','symbol'].sum()

  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,FUNDID,symbol
partner_program_id,program_id,model_id,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,1.0,260.0,DiglettTorchic,DiglettTorchic
1.0,1.0,261.0,Diglett,Diglett
1.0,1.0,262.0,Diglett,Diglett
1.0,1.0,263.0,Diglett,Diglett
1.0,1.0,264.0,Diglett,Diglett
1.0,1.0,265.0,Diglett,Diglett
1.0,1.0,266.0,Diglett,Diglett
1.0,1.0,267.0,Diglett,Diglett
1.0,1.0,268.0,Diglett,Diglett
1.0,1.0,269.0,HitmontopSmoochumDiglettGastlyTeddiursaMeowthSneaselXatuGrowlitheSmeargleStantlerTyrogue,HitmontopSmoochumDiglettGastlyTeddiursaMeowthSneaselXatuGrowlitheSmeargleStantlerTyrogue


In [None]:
#can outer join the below 2 df

In [273]:
combined_df.groupby(['partner_program_id','program_id','model_id'])['FUNDID'].apply(','.join).reset_index()

Unnamed: 0,partner_program_id,program_id,model_id,FUNDID
0,1,1,260,"Diglett,Torchic"
1,1,1,261,Diglett
2,1,1,262,Diglett
3,1,1,263,Diglett
4,1,1,264,Diglett
5,1,1,265,Diglett
6,1,1,266,Diglett
7,1,1,267,Diglett
8,1,1,268,Diglett
9,1,1,269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


In [276]:
combined_df.groupby(['partner_program_id','program_id','model_id'])['symbol'].apply(','.join).reset_index()

Unnamed: 0,partner_program_id,program_id,model_id,symbol
0,1,1,260,"Diglett,Torchic"
1,1,1,261,Diglett
2,1,1,262,Diglett
3,1,1,263,Diglett
4,1,1,264,Diglett
5,1,1,265,Diglett
6,1,1,266,Diglett
7,1,1,267,Diglett
8,1,1,268,Diglett
9,1,1,269,"Hitmontop,Smoochum,Diglett,Gastly,Teddiursa,Meowth,Sneasel,Xatu,Growlithe,Smeargle,Stantler,Tyrogue"


In [253]:
partner[partner.partner_program_id==1]

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
427,VWPALL000076,active,Medicham,1
428,VWPALL000076,active,Hitmontop,1
429,VWPALL000076,active,Smoochum,1
430,VWPALL000076,active,Diglett,1
431,VWPALL000076,01/15/2019,MedichamMega Medicham,1
432,VWPALL000076,active,Gastly,1
433,VWPALL000076,active,Teddiursa,1
434,VWPALL000076,active,Meowth,1
435,VWPALL000076,active,Sneasel,1
436,VWPALL000076,active,Electrike,1


In [249]:
model

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
0,VW0008000039,active,Medicham,8
1,VW0008000039,active,Arcanine,8
2,VW0008000039,11/01/2018,Clefairy,8
3,VW0008000039,active,Zubat,8
4,VW0008000039,active,Nidoking,8
5,VW0008000039,active,Jigglypuff,8
6,VW0008000039,active,CharizardMega Charizard X,8
7,VW0008000039,active,Electrike,8
8,VW0008000039,active,Growlithe,8
9,VW0008000039,active,Fearow,8


In [259]:
model_prop[model_prop.model_id==268]

Unnamed: 0,model_props_id,model_id,symbol
523,1899,268,Persian
524,1897,268,Diglett
525,1900,268,Psyduck
526,1896,268,Rattata
527,1898,268,Dugtrio


In [260]:
model[model.model_id==268]

Unnamed: 0,model_id,program_id
14,268,1


In [222]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
1,VW0008000039,active,Arcanine,8,65,8,311,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8,65,8,297,Clefairy
3,VW0008000039,active,Zubat,8,65,8,304,Zubat
4,VW0008000039,active,Nidoking,8,65,8,296,Nidoking
5,VW0008000039,active,Jigglypuff,8,65,8,302,Jigglypuff
6,VW0008000039,active,CharizardMega Charizard X,8,65,8,299,CharizardMega Charizard X
8,VW0008000039,active,Growlithe,8,65,8,310,Growlithe
9,VW0008000039,active,Fearow,8,55,8,286,Fearow
10,VW0008000039,active,Ekans,8,56,8,287,Ekans
11,VW0008000039,active,Arbok,8,57,8,288,Arbok


## Step 4 - Check for any closed funds
Check each `model_id` in each `program_id` in Vestwell's data to see if our partner has indicated a fund has closed.  We don't care about funds that have closed that aren't in Vestwell's data.  Add this information to your output from step 3.