# Generate data for weight time series assignment
I'm going to generate a dataset that contains the weight of twenty inviduals over six weeks. Half will be male and weigh more than the other half which will be female. I also want half of the people to gain weight and the other half to loose it. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Observed data

In [5]:
"""Parameters"""
# Select mean weight of each sex
means = {'male': 170, 'female': 140}

# We'll handle each sex separately. Note: we could sample from multiple means at once, but this does not generalize
#to groups of different sizes (like if we had different numbers of male and female)
df = pd.DataFrame()
seed = 123
for sex in ['male', 'female']:
    """Initalize data"""
    # Seed for reproducibility
    np.random.seed(seed)
    # Sample 60 samples from a normal distribution where person X weeks
    data = np.random.normal(means[sex], 2.5, (10, 6))

    """Introduce trend"""
    # Let's have some people loose weight while some people gain weight
    # We will sample weight fluctuations and then assign half of the people to lose and the other half to gain
    np.random.seed(seed)
    # Seed from a different start for next iteration
    seed += 1
    fluctuations = np.abs(np.random.normal(1, .5, (10, 6)))
    # alternate gains and losses
    direction = np.array([1, -1] * 5)
    # Take a look at this variable and you will see that each row alternates +/-
    weighted_flux = direction.reshape(-1, 1) * fluctuations
    # Now let's do a cumulative sum to create a trend (add up the fluctuations from week to week)
    cumulative_flux = weighted_flux.cumsum(1)

    """Combine flux with initial"""
    # `wink` (here we shift our baseline weights by the fluctuations)
    sex_data = data + cumulative_flux
    sex_df = pd.DataFrame(sex_data, columns=[f'Week_{x}' for x in range(data.shape[1])])
    sex_df['sex'] = sex
    df = pd.concat([df, sex_df])
df['person'] = [f'P{x}' for x in range(df.shape[0])]
df = df.set_index(['person', 'sex'])
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Week_0,Week_1,Week_2,Week_3,Week_4,Week_5
person,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
P0,male,167.743108,174.449221,173.804793,169.578463,172.608399,180.009209
P1,male,163.719962,167.928835,170.532989,164.634668,164.443746,164.951544
P2,male,175.474169,170.828989,172.094298,172.901199,181.604867,183.650401
P3,male,171.008108,168.270346,167.779617,167.91766,161.319162,165.010405
P4,male,167.238358,169.459805,174.7755,169.221694,173.37319,172.138097


# Metadata
Now that we have the observed data, I'm going to generate some metadata associated with each person

In [6]:
"""Get person and associated sex"""
# We already have this information from above, so I'm just going to grab it
md = df.reset_index()[['person', 'sex']]
# Let's also add which state each person is from. Let's do 4 different states
md['state'] = ['MA', 'NY', 'CA', 'TX'] * 5
md.head()

Unnamed: 0,person,sex,state
0,P0,male,MA
1,P1,male,NY
2,P2,male,CA
3,P3,male,TX
4,P4,male,MA


# Save

In [7]:
"""Now I'll save these to files"""
# I'm going to remove the sex from `df` so you'll have to figure out how to get it back later *wink*
df_save = df.reset_index()
df_save[[c for c in df_save if c!='sex']].to_csv('weight_data.csv', index=False)
# The metadata is good as it is, so we'll just save it
md.to_csv('weight_metadata.csv', index=False)