# Vestwell Python Screener

If you have any questions or feel you are making assumptions, please record them in this notebook or in comments if you'd rather work in a `.py` file.  If you get stuck, try to explain in words how you would complete the task(s).

### Background

Vestwell provides a wide variety of investment choices to its users.  Participants in a retirement plan can choose between a pre-determined set of funds or they can choose their own custom set of funds from a list of choices.  Advisors can create their own models with a custom set of funds in which participants can choose to invest.  As a result, there are thousands of unique models on the Vestwell platform.  

One of Vestwell's partners has the same list of models in its database.  This partner will maintain an up-to-date list of funds for each model in their database.  For example, when a fund closes and is replaced by a new one, Vestwell's partner will update the model with the new fund in their database, but not in Vestwell's.  For this reason, Vestwell's database and our partner's database will get out of sync over time.  Unless, that is, you can build python script to reconcile the two databases.  We're rooting for you!

### The Data

Here's a high-level overview of the data.  We'll get into more details below as we dig in.

**Vestwell Data**

Each `program_id` has many `model_id`s.  Each `model_id` has many `symbol`s.

* model.csv:  Associations of programs to models.
* model_prop.csv:  Association of models to symbols.

**Partner Data**

Each `PLANID` has many `FUNDID`s.

* partner.csv:  Association of `PLANID` to `FUNDID` and `PLANINVCLOSEDATE`.  

**Some extra notes**

* The `FUNDID` in our partner's data is equivalent to `symbol` in Vestwell's data.  These are also referred to as "funds".
* The `PLANID` in our partner's data has information that is equivalent to the `program_id` in Vestwell's database (more details below in Step 2).
* The `PLANINVCLOSEDATE` in our partner's database is the date when a fund was closed.  If there isn't a date, then the fund has not been closed.
* Sometimes our partner has funds called either "Medicham" or "Electrike" which we ignore.

### Goal
The goal of this exercise is to compare Vestwell's data with our partner's data.  We want to figure out if Vestwell's model data is the same as our Partner's model data.  We consider our partner's database the source of truth since their database will remain updated if there are any changes to funds.  Here's specifically what we are asking:

1.  Do the list of funds for each `program_id` in Vestwell's database match the list of funds in our partner's database?  If there are any mismatches, what funds are missing from each database?   

For example, if Vestwell's database has funds A, B and C for a `program_id` and our partner's database has funds B, C, and D for the same `program_id` we would report that fund A is missing from our partner's database and that fund D is missing from our database.

2.  Are there any funds in Vestwell's database that have closed?  If so, what are they for each `program_id`?

For example, if our database has funds D, E, and F for a `program_id` and partner's database shows that fund D closed on 11/1/2019, then we would report that fund D has closed for that `program_id`.

Ideally, the output is in a form that can be passed to a Business Analyst to take action on.  For example, the output could look something like this:

| program_id | fund_missing_at_vw | fund_missing_at_partner | fund_closed |
|------------|---------------|--------------------|--------|
| 1          | None          | None               | None   |
| 2          | D, Z             | A                  | F   |
| 3          | F             | None               | F      |

## Table Schema


![Schema](data/Vestwell_challange.png)

## Step 0
Import any packages you'll need

In [1]:
import pandas as pd
import numpy as np
import math

## Step 1
Import `partner.csv`, `model.csv`, and `model_prop.csv`.

### Load Files

In [2]:
model = pd.read_csv('data/model_abridged.csv')
model.head()

Unnamed: 0,model_id,program_id
0,28,3
1,34,4
2,42,4
3,24,3
4,64,8


In [3]:
model.shape

(187, 2)

In [5]:
model_prop = pd.read_csv('data/model_props_abridged.csv')
model_prop.head()

Unnamed: 0,model_props_id,model_id,symbol
0,541,80,Bulbasaur
1,542,80,Ivysaur
2,543,80,Venusaur
3,544,80,VenusaurMega Venusaur
4,545,80,Charmander


In [6]:
model_prop.shape

(729, 3)

In [7]:
partner = pd.read_csv('data/partner_abridged.csv')
partner.head()

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID
0,VW0008000039,active,Medicham
1,VW0008000039,active,Arcanine
2,VW0008000039,11/01/2018,Clefairy
3,VW0008000039,active,Zubat
4,VW0008000039,active,Nidoking


## Find Unique Columns in each table
Maybe delete3 later

### in Model table model_id is unique

In [8]:
#model.model_id looks like a unique key
model.model_id.value_counts()

336    1
74     1
83     1
82     1
81     1
      ..
155    1
154    1
153    1
152    1
282    1
Name: model_id, Length: 187, dtype: int64

### in model_prop table model_props_id is unique

In [9]:
#model.model_id looks like a unique key
model_prop.model_props_id.value_counts()

2525    1
1382    1
354     1
353     1
352     1
       ..
2700    1
2699    1
2698    1
2697    1
1615    1
Name: model_props_id, Length: 729, dtype: int64

### Partner table does NOT have any unique columns

In [10]:
partner.FUNDID.value_counts()

Electrike                    18
Medicham                     12
CharizardMega Charizard X     9
Charmander                    9
Gastly                        6
                             ..
Sudowoodo                     1
Feraligatr                    1
Kingler                       1
Krabby                        1
Flaaffy                       1
Name: FUNDID, Length: 250, dtype: int64

In [11]:
partner.PLANID.value_counts()

VW0027000228    39
VW0009000188    36
VW0015000141    35
VW0008000039    31
VW0012000114    28
VW0030000255    26
VW0013000121    25
VW0014000136    25
VW0021000178    25
VW0017000143    24
VW0020000173    24
VW0029000243    22
VW0019000168    20
VW0016000137    19
VWPALL000076    18
VW0028000216    17
VW0024000187    16
VW0010000135    15
Name: PLANID, dtype: int64

## Step 2 - working with the `partner.csv` data
Extract the `program_id` from the `PLANID` column in the `partner` dataframe.  The `program_id` is the first four characters in `PLANID` after "VW".  It's usually an integer.  If instead of digits, those characters are equal to "PALL" then the `program_id` = 1.  Drop any other rows remaining that do not have four digits in the first four characters after "VW" in the `PLANID` column.

In [12]:
def get_plan(plan_id):
    if plan_id[2:6]=='PALL':
        return 1
    #need to remove leading zeros 00
    #try except will only return the int on 2:6 if its a number
    try:   
        return int(plan_id[2:6])
    #otherwise return nan to be dropped
    except:
        return np.nan

In [13]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID
0,VW0008000039,active,Medicham
1,VW0008000039,active,Arcanine
2,VW0008000039,11/01/2018,Clefairy
3,VW0008000039,active,Zubat
4,VW0008000039,active,Nidoking
5,VW0008000039,active,Jigglypuff
6,VW0008000039,active,CharizardMega Charizard X
7,VW0008000039,active,Electrike
8,VW0008000039,active,Growlithe
9,VW0008000039,active,Fearow


In [14]:
partner['partner_program_id'] = 0

In [15]:
partner.partner_program_id = partner.PLANID.apply(lambda x: get_plan(x))

In [16]:
#need to drop the NAN columns
partner.dropna(inplace=True)

In [17]:
partner.shape

(445, 4)

In [18]:
partner.head(10)

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id
0,VW0008000039,active,Medicham,8
1,VW0008000039,active,Arcanine,8
2,VW0008000039,11/01/2018,Clefairy,8
3,VW0008000039,active,Zubat,8
4,VW0008000039,active,Nidoking,8
5,VW0008000039,active,Jigglypuff,8
6,VW0008000039,active,CharizardMega Charizard X,8
7,VW0008000039,active,Electrike,8
8,VW0008000039,active,Growlithe,8
9,VW0008000039,active,Fearow,8


## Step 3
Check if the funds match for each `program_id`.  In `partner.csv` the funds are in the `FUNDID` column and for `model_prop.csv` the funds are in the `symbol` column.  If there are any mismatches, return a list of which funds are missing from each database for each `program_id`.

### Merge model and model_prop df on model_id column

In [19]:
vw_df = pd.merge(model, model_prop,  how='left', left_on=['model_id'], 
                     right_on = ['model_id'])

In [20]:
vw_df

Unnamed: 0,model_id,program_id,model_props_id,symbol
0,28,3,119,Caterpie
1,28,3,120,Metapod
2,28,3,121,Butterfree
3,28,3,122,Weedle
4,28,3,123,Kakuna
...,...,...,...,...
724,381,30,2918,Raichu
725,382,30,2919,Sandshrew
726,383,30,2920,Sandslash
727,384,30,2921,Nidorina


In [21]:
vw_df.shape

(729, 4)

In [22]:
vw_df.isnull().sum()

model_id          0
program_id        0
model_props_id    0
symbol            0
dtype: int64

### Now combine vw_df with partner

In [23]:
combined_df = pd.merge(partner, vw_df,  how='outer', left_on=['FUNDID', 'partner_program_id'], 
                     right_on = ['symbol','program_id'])

In [24]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
0,VW0008000039,active,Medicham,8.0,,,,
1,VW0008000039,active,Arcanine,8.0,65.0,8.0,311.0,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8.0,65.0,8.0,297.0,Clefairy
3,VW0008000039,active,Zubat,8.0,65.0,8.0,304.0,Zubat
4,VW0008000039,active,Nidoking,8.0,65.0,8.0,296.0,Nidoking
...,...,...,...,...,...,...,...,...
780,,,,,278.0,26.0,1962.0,Metapod
781,,,,,279.0,26.0,1963.0,Butterfree
782,,,,,280.0,26.0,1964.0,Weedle
783,,,,,281.0,26.0,1965.0,Kakuna


In [25]:
combined_df.shape

(785, 8)

In [26]:
combined_df.program_id.value_counts().sum()

729

In [27]:
combined_df.partner_program_id.value_counts().sum()

509

### Drop Columns with "Medicham" or "Electrike"

In [28]:
dropindex = combined_df[combined_df['FUNDID'] == 'Medicham' ].index

In [29]:
#delete later
dropindex

Int64Index([0, 76, 91, 145, 223, 272, 305, 331, 347, 396, 443, 482], dtype='int64')

In [30]:
combined_df.drop(dropindex, inplace=True)

In [31]:
dropindex = combined_df[combined_df['FUNDID'] == 'Electrike' ].index

In [32]:
#delete later
dropindex

Int64Index([  7,  61,  88, 114, 131, 152, 185, 220, 229, 262, 295, 320, 342,
            372, 408, 430, 450, 500],
           dtype='int64')

In [33]:
combined_df.drop(dropindex, inplace=True)

#### Now Check size and counts

In [34]:
combined_df.shape

(755, 8)

In [35]:
combined_df.program_id.value_counts().sum()

729

In [36]:
combined_df.partner_program_id.value_counts().sum()

479

### Placeholder for na

In [37]:
combined_df.fillna('missing', inplace=True)

In [38]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
1,VW0008000039,active,Arcanine,8,65,8,311,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8,65,8,297,Clefairy
3,VW0008000039,active,Zubat,8,65,8,304,Zubat
4,VW0008000039,active,Nidoking,8,65,8,296,Nidoking
5,VW0008000039,active,Jigglypuff,8,65,8,302,Jigglypuff
...,...,...,...,...,...,...,...,...
780,missing,missing,missing,missing,278,26,1962,Metapod
781,missing,missing,missing,missing,279,26,1963,Butterfree
782,missing,missing,missing,missing,280,26,1964,Weedle
783,missing,missing,missing,missing,281,26,1965,Kakuna


### Create Dictionaries of program_id for vw and partner

In [39]:
vw_dict = {}
for prog_id in combined_df.program_id.values:
    if prog_id in vw_dict:
        pass
    else:
        vw_dict[prog_id]= []

In [40]:
#del later
vw_dict

{8.0: [],
 'missing': [],
 9.0: [],
 10.0: [],
 12.0: [],
 13.0: [],
 14.0: [],
 15.0: [],
 16.0: [],
 17.0: [],
 19.0: [],
 20.0: [],
 21.0: [],
 24.0: [],
 27.0: [],
 28.0: [],
 29.0: [],
 30.0: [],
 1.0: [],
 3.0: [],
 4.0: [],
 11.0: [],
 22.0: [],
 26.0: []}

In [41]:
partner_dict = {}
for prog_id in combined_df.partner_program_id.values:
    if prog_id in partner_dict:
        pass
    else:
        partner_dict[prog_id]= []

In [42]:
#del later
partner_dict

{8.0: [],
 9.0: [],
 10.0: [],
 12.0: [],
 13.0: [],
 14.0: [],
 15.0: [],
 16.0: [],
 17.0: [],
 19.0: [],
 20.0: [],
 21.0: [],
 24.0: [],
 27.0: [],
 28.0: [],
 29.0: [],
 30.0: [],
 1.0: [],
 'missing': []}

### Now add funds associated with each program_id dictionary

In [43]:
#adding funds to each dict
for id_, fund in zip(combined_df.program_id.values, combined_df.symbol.values):
    vw_dict[id_].append(fund)
for id_, fund in zip(combined_df.partner_program_id.values, combined_df.FUNDID.values):
        partner_dict[id_].append(fund)    


In [44]:
# delete missing key and values
del vw_dict['missing']
del partner_dict['missing']

In [45]:
#cast each list of values to a set

### Now build output_dict

#### Building output dict

In [46]:
partner_dict.keys()

dict_keys([8.0, 9.0, 10.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 19.0, 20.0, 21.0, 24.0, 27.0, 28.0, 29.0, 30.0, 1.0])

In [47]:
# start by populating the keys(program_ids)
output_dict_id = {}
for prog_id in partner_dict.keys():
    if prog_id in output_dict_id:
        pass
    else:
        output_dict_id[prog_id]= {'vw': [], 'partner': []}
#add any missing keys from vw dict
for prog_id in vw_dict.keys():
    if prog_id in output_dict_id:
        pass
    else:
        output_dict_id[prog_id]= {'vw': [], 'partner': []}

In [48]:
#delete later
output_dict_id

{8.0: {'vw': [], 'partner': []},
 9.0: {'vw': [], 'partner': []},
 10.0: {'vw': [], 'partner': []},
 12.0: {'vw': [], 'partner': []},
 13.0: {'vw': [], 'partner': []},
 14.0: {'vw': [], 'partner': []},
 15.0: {'vw': [], 'partner': []},
 16.0: {'vw': [], 'partner': []},
 17.0: {'vw': [], 'partner': []},
 19.0: {'vw': [], 'partner': []},
 20.0: {'vw': [], 'partner': []},
 21.0: {'vw': [], 'partner': []},
 24.0: {'vw': [], 'partner': []},
 27.0: {'vw': [], 'partner': []},
 28.0: {'vw': [], 'partner': []},
 29.0: {'vw': [], 'partner': []},
 30.0: {'vw': [], 'partner': []},
 1.0: {'vw': [], 'partner': []},
 3.0: {'vw': [], 'partner': []},
 4.0: {'vw': [], 'partner': []},
 11.0: {'vw': [], 'partner': []},
 22.0: {'vw': [], 'partner': []},
 26.0: {'vw': [], 'partner': []}}

#### Now populate with Funds

In [49]:
#vw funds
for key,value in vw_dict.items():
    output_dict_id[key]['vw']=value
#partner funds
for key,value in partner_dict.items():
    output_dict_id[key]['partner']=value
    
#cast each list ofvalue

In [50]:
output_dict_id

{8.0: {'vw': ['Arcanine',
   'Clefairy',
   'Zubat',
   'Nidoking',
   'Jigglypuff',
   'CharizardMega Charizard X',
   'Growlithe',
   'Fearow',
   'Ekans',
   'Arbok',
   'Raichu',
   'Sandslash',
   'Nidoqueen',
   'Oddish',
   'Clefable',
   'Vileplume',
   'Ninetales',
   'Primeape',
   'Vulpix',
   'Wigglytuff',
   'Golbat',
   'Gloom',
   'Raticate',
   'Spearow',
   'Pikachu',
   'Sandshrew',
   'Nidorina',
   'Nidorino'],
  'partner': ['Arcanine',
   'Clefairy',
   'Zubat',
   'Nidoking',
   'Jigglypuff',
   'CharizardMega Charizard X',
   'Growlithe',
   'Fearow',
   'Ekans',
   'Arbok',
   'Raichu',
   'Sandslash',
   'Nidoqueen',
   'Oddish',
   'Clefable',
   'Vileplume',
   'Carvanha',
   'Ninetales',
   'Primeape',
   'Vulpix',
   'Wigglytuff',
   'Golbat',
   'Gloom',
   'Raticate',
   'Spearow',
   'Pikachu',
   'Sandshrew',
   'Nidorina',
   'Nidorino']},
 9.0: {'vw': ['Poliwag',
   'Geodude',
   'Zubat',
   'Poliwhirl',
   'Golem',
   'Rapidash',
   'Rapidash',
   'S

### Create dict of fund_missing_at_partner & fund_missing_at_vw  
Populate with missing values for each column

In [51]:
#items not in dict
fund_missing_at_partner ={}
fund_missing_at_vw ={}
for keys in output_dict_id.keys():
    fund_missing_at_partner[keys] = []
    fund_missing_at_vw[keys] = []

In [52]:
#items not in partner_dict
for k in fund_missing_at_partner.keys():
    try:  
        for each in set(vw_dict[k]):
            try:         
                if each not in set(partner_dict[k]):
                    fund_missing_at_partner[k].append(each)
    #                 print(each)
            except:
                pass
    except:
        #not missing any funds since other dict empty on this key 
        pass

In [53]:
# set(vw_dict[8])

In [54]:
# set(partner_dict[8])

In [55]:
#delete later
fund_missing_at_partner

{8.0: [],
 9.0: [],
 10.0: [],
 12.0: [],
 13.0: [],
 14.0: [],
 15.0: [],
 16.0: ['Steelix'],
 17.0: [],
 19.0: [],
 20.0: [],
 21.0: [],
 24.0: ['Onix'],
 27.0: ['SceptileMega Sceptile'],
 28.0: [],
 29.0: ['Phanpy',
  'Psyduck',
  'Dugtrio',
  'Donphan',
  'Porygon2',
  'Venonat',
  'Persian',
  'Rattata',
  'Paras'],
 30.0: [],
 1.0: ['Phanpy',
  'Psyduck',
  'Dugtrio',
  'Donphan',
  'Porygon2',
  'Venonat',
  'Persian',
  'Rattata',
  'Paras'],
 3.0: [],
 4.0: [],
 11.0: [],
 22.0: [],
 26.0: []}

In [56]:
#items not in vw_dict
for k in fund_missing_at_vw.keys():
    try:
        for each in set(partner_dict[k]):
            try:         
                if each not in set(vw_dict[k]):
                    fund_missing_at_vw[k].append(each)
    #                 print(each)
            except:
                pass
    except:
        #not missing any funds since other dict empty on this key 
        pass

In [57]:
#delete later
fund_missing_at_vw

{8.0: ['Carvanha'],
 9.0: ['Sharpedo'],
 10.0: [],
 12.0: [],
 13.0: [],
 14.0: [],
 15.0: ['SharpedoMega Sharpedo',
  'Wailmer',
  'CharizardMega Charizard Y',
  'CharizardMega Charizard X',
  'Charmander',
  'BlastoiseMega Blastoise'],
 16.0: [],
 17.0: [],
 19.0: [],
 20.0: [],
 21.0: [],
 24.0: ['Wailord'],
 27.0: ['Plusle', 'Sharpedo'],
 28.0: ['Bulbasaur', 'Plusle'],
 29.0: ['Venomoth',
  'Parasect',
  'Minun',
  'Golduck',
  'MedichamMega Medicham',
  'Illumise',
  'Volbeat',
  'Roselia',
  'Gulpin'],
 30.0: ['Numel'],
 1.0: ['ManectricMega Manectric', 'MedichamMega Medicham', 'Manectric'],
 3.0: [],
 4.0: [],
 11.0: [],
 22.0: [],
 26.0: []}

In [58]:
#populate all values sets for both fund_missing dict
for key in fund_missing_at_partner.keys():
    if fund_missing_at_partner[key]:#if something is there
        fund_missing_at_partner[key] =set(fund_missing_at_partner[key])
    else:
        fund_missing_at_partner[key] = math.nan
        
for key in fund_missing_at_vw.keys():
    if fund_missing_at_vw[key]:#if something is there
        fund_missing_at_vw[key] =set(fund_missing_at_vw[key])
    else:
        fund_missing_at_vw[key] = math.nan

In [59]:
fund_missing_at_partner

{8.0: nan,
 9.0: nan,
 10.0: nan,
 12.0: nan,
 13.0: nan,
 14.0: nan,
 15.0: nan,
 16.0: {'Steelix'},
 17.0: nan,
 19.0: nan,
 20.0: nan,
 21.0: nan,
 24.0: {'Onix'},
 27.0: {'SceptileMega Sceptile'},
 28.0: nan,
 29.0: {'Donphan',
  'Dugtrio',
  'Paras',
  'Persian',
  'Phanpy',
  'Porygon2',
  'Psyduck',
  'Rattata',
  'Venonat'},
 30.0: nan,
 1.0: {'Donphan',
  'Dugtrio',
  'Paras',
  'Persian',
  'Phanpy',
  'Porygon2',
  'Psyduck',
  'Rattata',
  'Venonat'},
 3.0: nan,
 4.0: nan,
 11.0: nan,
 22.0: nan,
 26.0: nan}

In [60]:
fund_missing_at_vw

{8.0: {'Carvanha'},
 9.0: {'Sharpedo'},
 10.0: nan,
 12.0: nan,
 13.0: nan,
 14.0: nan,
 15.0: {'BlastoiseMega Blastoise',
  'CharizardMega Charizard X',
  'CharizardMega Charizard Y',
  'Charmander',
  'SharpedoMega Sharpedo',
  'Wailmer'},
 16.0: nan,
 17.0: nan,
 19.0: nan,
 20.0: nan,
 21.0: nan,
 24.0: {'Wailord'},
 27.0: {'Plusle', 'Sharpedo'},
 28.0: {'Bulbasaur', 'Plusle'},
 29.0: {'Golduck',
  'Gulpin',
  'Illumise',
  'MedichamMega Medicham',
  'Minun',
  'Parasect',
  'Roselia',
  'Venomoth',
  'Volbeat'},
 30.0: {'Numel'},
 1.0: {'Manectric', 'ManectricMega Manectric', 'MedichamMega Medicham'},
 3.0: nan,
 4.0: nan,
 11.0: nan,
 22.0: nan,
 26.0: nan}

### Build output df and assign each missing fund value 

In [61]:
output_df = pd.DataFrame(columns=['program_id', 'fund_missing_at_vw', 'fund_missing_at_partner', 'fund_closed'])

In [62]:
output_df.program_id = output_dict_id.keys()
output_df.set_index(['program_id'], inplace=True)

In [63]:
output_df

Unnamed: 0_level_0,fund_missing_at_vw,fund_missing_at_partner,fund_closed
program_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8.0,,,
9.0,,,
10.0,,,
12.0,,,
13.0,,,
14.0,,,
15.0,,,
16.0,,,
17.0,,,
19.0,,,


In [64]:
def get_fund(x, missing_fund_dict):
    return missing_fund_dict[x]

In [65]:
output_df.reset_index(inplace=True)

In [66]:
output_df.fund_missing_at_vw = output_df.program_id.apply(lambda x: get_fund(x,fund_missing_at_vw))

In [67]:
output_df.fund_missing_at_partner = output_df.program_id.apply(lambda x: get_fund(x,fund_missing_at_partner))

In [68]:
output_df

Unnamed: 0,program_id,fund_missing_at_vw,fund_missing_at_partner,fund_closed
0,8.0,{Carvanha},,
1,9.0,{Sharpedo},,
2,10.0,,,
3,12.0,,,
4,13.0,,,
5,14.0,,,
6,15.0,"{SharpedoMega Sharpedo, Charmander, CharizardM...",,
7,16.0,,{Steelix},
8,17.0,,,
9,19.0,,,


In [69]:
output_df.fillna('None', inplace=True)

# Step 4 - Check for any closed funds
Check each `program_id` to see if our partner has indicated a fund that is in Vestwell's `model` has been closed and add that to the output from Step 3.

In [70]:
#set output_df index to program_id
output_df.set_index(['program_id'], inplace=True)

In [71]:
combined_df.replace('missing', np.nan, inplace=True)

In [72]:
combined_df

Unnamed: 0,PLANID,PLANINVCLOSEDATE,FUNDID,partner_program_id,model_id,program_id,model_props_id,symbol
1,VW0008000039,active,Arcanine,8.0,65.0,8.0,311.0,Arcanine
2,VW0008000039,11/01/2018,Clefairy,8.0,65.0,8.0,297.0,Clefairy
3,VW0008000039,active,Zubat,8.0,65.0,8.0,304.0,Zubat
4,VW0008000039,active,Nidoking,8.0,65.0,8.0,296.0,Nidoking
5,VW0008000039,active,Jigglypuff,8.0,65.0,8.0,302.0,Jigglypuff
...,...,...,...,...,...,...,...,...
780,,,,,278.0,26.0,1962.0,Metapod
781,,,,,279.0,26.0,1963.0,Butterfree
782,,,,,280.0,26.0,1964.0,Weedle
783,,,,,281.0,26.0,1965.0,Kakuna


In [73]:
for index, row in combined_df.iterrows():
    if (row['PLANINVCLOSEDATE'] != 'active') &(~pd.isna(row['PLANINVCLOSEDATE'])) & (~pd.isna(row['program_id'])):
        if output_df.at[row['program_id'], 'fund_closed'] == 'None':
            output_df.at[row['program_id'], 'fund_closed'] = row['FUNDID']
        else:
            output_df.at[row['program_id'], 'fund_closed'] = output_df.at[row['program_id'],
                                                                          'fund_closed'] + ', ' + row['FUNDID']
        

In [74]:
output_df

Unnamed: 0_level_0,fund_missing_at_vw,fund_missing_at_partner,fund_closed
program_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8.0,{Carvanha},,Clefairy
9.0,{Sharpedo},,Poliwhirl
10.0,,,
12.0,,,
13.0,,,
14.0,,,Dratini
15.0,"{SharpedoMega Sharpedo, Charmander, CharizardM...",,"Sunkern, Slowking, Espeon, Politoed, Murkrow, ..."
16.0,,{Steelix},
17.0,,,
19.0,,,


### Clean output_df to look like example

In [75]:
def cleaner(x):
    if x =='None':
        return x
    return ', '.join(str(e) for e in x)

In [76]:
output_df.fund_missing_at_vw = output_df.fund_missing_at_vw.apply(cleaner)
output_df.fund_missing_at_partner = output_df.fund_missing_at_partner.apply(cleaner)

In [77]:
output_df

Unnamed: 0_level_0,fund_missing_at_vw,fund_missing_at_partner,fund_closed
program_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8.0,Carvanha,,Clefairy
9.0,Sharpedo,,Poliwhirl
10.0,,,
12.0,,,
13.0,,,
14.0,,,Dratini
15.0,"SharpedoMega Sharpedo, Charmander, CharizardMe...",,"Sunkern, Slowking, Espeon, Politoed, Murkrow, ..."
16.0,,Steelix,
17.0,,,
19.0,,,


In [78]:
output_df.sort_index()

Unnamed: 0_level_0,fund_missing_at_vw,fund_missing_at_partner,fund_closed
program_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1.0,"MedichamMega Medicham, Manectric, ManectricMeg...","Phanpy, Donphan, Psyduck, Porygon2, Persian, P...",
3.0,,,
4.0,,,
8.0,Carvanha,,Clefairy
9.0,Sharpedo,,Poliwhirl
10.0,,,
11.0,,,
12.0,,,
13.0,,,
14.0,,,Dratini


In [79]:
output_df.sort_index().to_csv('output_df.csv')