### Wide to Long Conversions

I have a dataset of dairy cow level data that is arranged in a very specfic format, specifically a __wide format__.

I would like it to be in a __long format__.

How do we solve this problem?

Assume that:
- The data is too big to fit in memory.
- There is a "wide_to_long" function like [this one in pandas](https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html).
- The wide to long conversion cannot be done for all the rows at once without hitting memory problems.

### Data Explanation
This data is from an organization called a __Dairy Herd Improvement Association__. This organization takes monthly measurements (called "tests") for all the dairy cows on a member farm. Dairy cows produce in a "cycle," and DHIA estimates "total yield" for their whole cycle from a mathematical formula using the monthly measurements.

Description of the data:
- cow_id: the id of the dairy cow.
- total_yield: total milk yield for that cycle.
- max_segment: the total number of tests taken.
- seg_yield: the calculated yield for that test.
- seg_stage: the stage of production cycle.
- seg_time: the date of the test.

### Objective: Make Data1 into Data2

In [5]:
data1 = '''
cow_id     , total_yield, max_segment, cycle, seg1_yield, seg1_time, seg1_stage, seg2_yield, seg2_time, seg2_stage, seg3_yield, seg3_time, seg3_stage
1          , 10         , 1          , 1    , 10        , 1/3/2013 , 1         ,           ,          ,           ,           ,          , 
1          , 55         , 2          , 1    , 10        , 1/3/2013 , 1         , 6         , 2/3/2013 , 2         ,           ,          , 
2          , 306        , 3          , 1    , 4         , 3/13/2013, 1         , 4         , 4/13/2013, 2         , 12        , 5/13/2013, 3
2          , 35         , 1          , 2    , 10        , 7/3/2013 , 1         ,           ,          ,           ,           ,          , 
'''

data2 = '''
cow_id     , total_yield, max_segment, cycle, seg_yield , seg_time , seg_stage 
1          , 10         , 1          , 1    , 10        , 1/3/2013 , 1         
1          , 55         , 2          , 1    , 6         , 2/3/2013 , 2         
2          ,            , 3          , 1    , 4         , 3/13/2013, 1         
2          ,            , 3          , 1    , 6         , 4/13/2013, 2         
2          , 306        , 3          , 1    , 12        , 5/13/2013, 3         
2          , 35         , 1          , 2    , 10        , 7/3/2013 , 1               
'''

In [6]:
import pandas as pd
from io import StringIO

df1 = pd.read_csv(StringIO(data1))
df2 = pd.read_csv(StringIO(data2))

In [9]:
df1

Unnamed: 0,cow_id,total_yield,max_segment,cycle,seg1_yield,seg1_time,seg1_stage,seg2_yield,seg2_time,seg2_stage,seg3_yield,seg3_time,seg3_stage
0,1,10,1,1,10,1/3/2013,1,,,,,,
1,1,55,2,1,10,1/3/2013,1,6.0,2/3/2013,2.0,,,
2,2,306,3,1,4,3/13/2013,1,4.0,4/13/2013,2.0,12.0,5/13/2013,3.0
3,2,35,1,2,10,7/3/2013,1,,,,,,


In [10]:
df2

Unnamed: 0,cow_id,total_yield,max_segment,cycle,seg_yield,seg_time,seg_stage
0,1,10.0,1,1,10,1/3/2013,1
1,1,55.0,2,1,6,2/3/2013,2
2,2,,3,1,4,3/13/2013,1
3,2,,3,1,6,4/13/2013,2
4,2,306.0,3,1,12,5/13/2013,3
5,2,35.0,1,2,10,7/3/2013,1
