# Cleaning Data (continued): Reshaping
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp25&branch=main&urlpath=tree%2Fdata271_sp25%2Flectures%2Fdata271_lec28_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import pandas as pd

### Reshaping data

In [None]:
df_weather_wide = pd.read_csv('sample_weather.csv')
df_weather_wide = df_weather_wide.iloc[:,1:]
df_weather_wide

In [None]:
# transpose with more informative columns
date_index = df_weather_wide.set_index('date')
date_index

In [None]:
# Make a long series
long_df = date_index.stack()
long_df

In [None]:
long_df = long_df.reset_index(name='value')
long_df

In [None]:
long_df.rename(columns = {'level_1':'variable'},inplace=True)
long_df

In [None]:
# If our columns had a name
date_index.columns.name = 'variable'

In [None]:
date_index

In [None]:
# Now renaming isn't necessary
date_index.stack().reset_index(name = 'value')

In [None]:
# another way
df_weather_wide

In [None]:
# another way
long_df = df_weather_wide.melt(id_vars = 'date',value_vars = ['max_temp','min_temp','inches_of_rain'])
long_df

In [None]:
# change long format back into wide format
long_df.pivot(index = 'date',columns = 'variable',values='value')

### What do when there are multiple values in categories 

In [None]:
# A new long dataframe
long_df = pd.read_csv('long_data.csv')
long_df = long_df.iloc[:,1:]
long_df.head()

In [None]:
# check the number of entries for each combination of date/category
pd.crosstab(index=long_df.date,columns=long_df.category)

In [None]:
# Pivot doesn't know how to handle this
long_df.pivot(index='date', columns='category', values='sales')

In [None]:
# Use pivot table instead to get the average sales by date and category
long_df.pivot_table(index='date', columns='category', values='sales')

In [None]:
# You can also change the aggregation function; e.g. TOTAL sales by date/category
wide_df = long_df.pivot_table(index='date', columns='category', values='sales', aggfunc=sum)
wide_df

In [None]:
# Can also use it like crosstab if you choose len as the aggfunc
long_df.pivot_table(index=['date'], columns='category', values=['sales'], aggfunc=len)

In [None]:
# back to a longer format (note that this only has total sales)
wide_df.reset_index().melt(id_vars='date', value_vars=['Books','Clothing','Electronics'])

In [None]:
# You can also change choose multiple columns
wide_df2 = long_df.pivot_table(index='date', columns=['category','product'], values='sales', aggfunc=sum)
wide_df2

In [None]:
# Rename columns and reset index to work with it as you normally would
wide_df2.columns = list(map("_".join, wide_df2.columns))
wide_df2.reset_index()

## Activity

In [None]:

# Create a DataFrame with data cleaning and reshaping opportunities
data = {
    'Pet Name': ['Fluffy', 'Whiskers', 'Bubbles', 'Spike', 'Coco', 'Maybelle', 'Snowball'],
    'Date Adopted': ['10-01-2023','03-04-2024','01-10-2024','02-14-2024','11-22-2023','01-04-2024','12-25-2025'],
    'Animal Type': ['Cat', 'Cat', 'Fish', 'Dog', 'Fish', 'Dog', 'Cat'],
    'Pet Age': ['3', '2', '13', '5', '4', '3', '2'],
    'Color': ['White', 'Gray', 'Orange', 'White', 'White', 'Black', 'Black'],
    'Happiness Level': ['High', 'Medium', 'High', 'Low', 'High', 'High', 'Medium']
}
df_pets = pd.DataFrame(data)
df_pets

**Activity 1:** Rename the columns of the pets dataframe to be in a better format.

**Activity 2:** Change any datatypes that should be adjusted.  

**Activity 3:** Practice pivoting the dataframe.