# <center>Importing and Exporting Data Tasks</center>

In [1]:
import pandas as pd

## Assignment 1: Streamlined Data Ingestion

Now that we have a good idea of what we want the data prep on transactions looks like,
let's push that to the read_csv function. 

Keep an eye on the memory usage before and after. 

* Change the column names to 'Date', 'Store_Number', and 'Transaction_Count'.
* Skip the first row of data.
* Convert columns to the appropriate datatypes. 

Then create the columns we created in the assign assignment in Section 3, by chaining assign with read_csv. 

Some starter code has been provided for you below. Because the dataframe object returned by read_csv doesn't have a name, we need to use a lambda function to refer to the dataframe.

`transactions.assign(
    target_pct=transactions["transactions"] / 2500,
    met_target=(transactions["transactions"] / 2500) >= 1,
    bonus_payable=((transactions["transactions"] / 2500) >= 1) * 100,
    month=transactions["date"].dt.month,
    day_of_week=transactions["date"].dt.dayofweek,
)`

The first one should look like:

`target_pct = lambda x: (x["Transaction_Count"] / 2500)`


In [7]:
pd.read_csv("transactions.csv").info(memory_usage="deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83488 entries, 0 to 83487
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   date          83488 non-null  object
 1   store_nbr     83488 non-null  int64 
 2   transactions  83488 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 6.6 MB


In [8]:
transaction_df = pd.read_csv("transactions.csv",header=0,
                             names=["Date","Store_Number","Transaction_count"],
                            skiprows =0,
                            parse_dates=["Date"],
                            dtype={"Store_Number":"Int8","Transaction_count":"Int16"})

In [11]:
transaction_df =transaction_df.assign(Target_pct=transaction_df["Transaction_count"]/2500)

In [17]:
transaction_df =transaction_df.assign(Met_target = 
                transaction_df["Target_pct"]>=1,
        bonus_payable = lambda x: (x["Met_target"]* 100),
        month = transaction_df["Date"].dt.month,
        day_of_week = transaction_df["Date"].dt.dayofweek)

In [19]:
transaction_df.head()

Unnamed: 0,Date,Store_Number,Transaction_count,Target_pct,Met_target,bonus_payable,month,day_of_week
0,2013-01-01,25,770,0.308,False,0,1,1
1,2013-01-02,1,2111,0.8444,False,0,1,2
2,2013-01-02,2,2358,0.9432,False,0,1,2
3,2013-01-02,3,3487,1.3948,True,100,1,2
4,2013-01-02,4,1922,0.7688,False,0,1,2


In [21]:
transaction_df = transaction_df.astype({"Target_pct":"Float32","month":"Int8",
                       "day_of_week":"Int8"})

In [22]:
transaction_df.info(memory_usage="deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83488 entries, 0 to 83487
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Date               83488 non-null  datetime64[ns]
 1   Store_Number       83488 non-null  Int8          
 2   Transaction_count  83488 non-null  Int16         
 3   Target_pct         83488 non-null  Float32       
 4   Met_target         83488 non-null  boolean       
 5   bonus_payable      83488 non-null  Int32         
 6   month              83488 non-null  Int8          
 7   day_of_week        83488 non-null  Int8          
dtypes: Float32(1), Int16(1), Int32(1), Int8(3), boolean(1), datetime64[ns](1)
memory usage: 2.3 MB


## Assignment 2: Write to Excel Sheets

Write the data in the transactions dataframe you created above into an Excel workbook.

Write out a separate sheet for each year of the data.

If you prefer, you can write each year of data to a separate csv file.

In [23]:
transaction_df.head()

Unnamed: 0,Date,Store_Number,Transaction_count,Target_pct,Met_target,bonus_payable,month,day_of_week
0,2013-01-01,25,770,0.308,False,0,1,1
1,2013-01-02,1,2111,0.8444,False,0,1,2
2,2013-01-02,2,2358,0.9432,False,0,1,2
3,2013-01-02,3,3487,1.3948,True,100,1,2
4,2013-01-02,4,1922,0.7688,False,0,1,2


In [24]:
transaction_df.tail()

Unnamed: 0,Date,Store_Number,Transaction_count,Target_pct,Met_target,bonus_payable,month,day_of_week
83483,2017-08-15,50,2804,1.1216,True,100,8,1
83484,2017-08-15,51,1573,0.6292,False,0,8,1
83485,2017-08-15,52,2255,0.902,False,0,8,1
83486,2017-08-15,53,932,0.3728,False,0,8,1
83487,2017-08-15,54,802,0.3208,False,0,8,1


In [25]:
#open excelwriter to write multiple sheets

with pd.ExcelWriter("Transactiondata.xlsx") as writer:
    for year in range(2013,2018):# Specify years to filter by for each sheet and loop through them
        (transaction_df
        .loc[transaction_df["Date"].dt.year == year] # Filter DF to year in current iteration of loop
        .to_excel(writer, sheet_name=str(year)))   # Write each year's DF to sheet named for that year

In [26]:
for year in range(2013, 2018):                     # Specify years to filter by for each sheet and loop through them
    (transaction_df
     .loc[transaction_df["Date"].dt.year == year]    # Filter DF to year in current iteration of loop
     .to_csv(f"transactions_{year}.csv")           # Write each year's DF to sheet named for that year
    )