# Pivoting duplicate values

#### EXERCISE:
So far, you've used the <code>.pivot_table()</code> method when there are multiple <code>index</code> values you want to hold constant during a pivot. In the video, Dan showed you how you can also use pivot tables to deal with duplicate values by providing an aggregation function through the <code>aggfunc</code> parameter. Here, you're going to combine both these uses of pivot tables.

Let's say your data collection method accidentally duplicated your dataset. Such a dataset, in which each row is duplicated, has been pre-loaded as <code>airquality_dup</code>. In addition, the <code>airquality_melt</code> DataFrame from the previous exercise has been pre-loaded. Explore their shapes in the IPython Shell by accessing their <code>.shape</code> attributes to confirm the duplicate rows present in <code>airquality_dup</code>.

You'll see that by using <code>.pivot_table()</code> and the <code>aggfunc</code> parameter, you can not only reshape your data, but also remove duplicates. Finally, you can then flatten the columns of the pivoted DataFrame using <code>.reset_index()</code>.

NumPy and pandas have been imported as <code>np</code> and <code>pd</code> respectively.

#### INSTRUCTIONS:
* Pivot <code>airquality_dup</code> by using <code>.pivot_table()</code> with the rows indexed by <code>'Month'</code> and <code>'Day'</code>, the columns indexed by <code>'measurement'</code>, and the values populated with <code>'reading'</code>. Use <code>np.mean</code> for the aggregation function.
* Print the head of <code>airquality_pivot</code>.
* Flatten <code>airquality_pivot</code> by resetting its index.
* Print the head of <code>airquality_pivot</code> and then the original <code>airquality</code> DataFrame to compare their structure.

#### SCRIPT.PY:

In [2]:
import pandas as pd
import numpy as np
airquality_dup = pd.read_csv("airquality_dup.csv")
airquality = pd.read_csv("airquality.csv")
airquality_melt = pd.melt(airquality, id_vars=['Month', 'Day'], var_name="measurement", value_name="reading")
# Pivot table the airquality_dup: airquality_pivot
airquality_pivot = airquality_dup.pivot_table(index=["Month", "Day"], columns="measurement", values="reading", aggfunc=np.mean)

# Print the head of airquality_pivot before reset_index
print(airquality_pivot.head())

# Reset the index of airquality_pivot
airquality_pivot = airquality_pivot.reset_index()

# Print the head of airquality_pivot
print(airquality_pivot.head())

# Print the head of airquality
print(airquality.head())


measurement  Ozone  Solar.R  Temp  Wind
Month Day                              
5     1       41.0    190.0  67.0   7.4
      2       36.0    118.0  72.0   8.0
      3       12.0    149.0  74.0  12.6
      4       18.0    313.0  62.0  11.5
      5        NaN      NaN  56.0  14.3
measurement  Month  Day  Ozone  Solar.R  Temp  Wind
0                5    1   41.0    190.0  67.0   7.4
1                5    2   36.0    118.0  72.0   8.0
2                5    3   12.0    149.0  74.0  12.6
3                5    4   18.0    313.0  62.0  11.5
4                5    5    NaN      NaN  56.0  14.3
   Ozone  Solar.R  Wind  Temp  Month  Day
0   41.0    190.0   7.4    67      5    1
1   36.0    118.0   8.0    72      5    2
2   12.0    149.0  12.6    74      5    3
3   18.0    313.0  11.5    62      5    4
4    NaN      NaN  14.3    56      5    5
