- `pivot()` and `pivot_table()`: Group unique values within one or more discrete categories.
- `melt()`: Unpivot a wide DataFrame to a long format.
- `get_dummies()`:Conversions with indicator variables.
- `explode()`: Convert a column of list-like values to individual rows.
- `crosstab()`: Calculate a cross-tabulation of multiple 1 dimensional factor arrays.

#### pivot()

In [10]:
import numpy as np
import pandas as pd
import datetime as dt

data = {
 "value": range(12),
 "variable": ["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3,
 "date": pd.to_datetime(["2020-01-03", "2020-01-04", "2020-01-05"] * 4)
}

df = pd.DataFrame(data)

In [None]:
pivoted = df.pivot(index="date", columns="variable", values="value")
pivoted

In [8]:
df["value2"] = df["value"] * 10
print (df)
pivoted = df.pivot(index="date", columns="variable")
pivoted["value2"]

    value variable       date  value2
0       0        A 2020-01-03       0
1       1        A 2020-01-04       2
2       2        A 2020-01-05       4
3       3        B 2020-01-03       6
4       4        B 2020-01-04       8
5       5        B 2020-01-05      10
6       6        C 2020-01-03      12
7       7        C 2020-01-04      14
8       8        C 2020-01-05      16
9       9        D 2020-01-03      18
10     10        D 2020-01-04      20
11     11        D 2020-01-05      22


variable,A,B,C,D
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-03,0,6,12,18
2020-01-04,2,8,14,20
2020-01-05,4,10,16,22


In [16]:
df = pd.DataFrame(
    {
        "A": ["one", "one", "two", "three"] * 6,
        "B": ["A", "B", "C"] * 8,
        "C": ["foo", "foo", "foo", "bar", "bar", "bar"] * 4,
        "D": np.random.randn(24),
        "E": np.random.randn(24),
        "F": [dt.datetime(2013, i, 1) for i in range(1, 13)]
        + [dt.datetime(2013, i, 15) for i in range(1, 13)],
    }
)
df

Unnamed: 0,A,B,C,D,E,F
0,one,A,foo,-0.366899,1.356404,2013-01-01
1,one,B,foo,-0.719226,-0.813809,2013-02-01
2,two,C,foo,0.517294,-1.663481,2013-03-01
3,three,A,bar,-0.653708,-0.451614,2013-04-01
4,one,B,bar,1.587003,0.752667,2013-05-01
5,one,C,bar,2.060417,-0.922629,2013-06-01
6,two,A,foo,-3.146644,-0.759221,2013-07-01
7,three,B,foo,-0.028503,-0.122012,2013-08-01
8,one,C,foo,1.331095,-0.354274,2013-09-01
9,one,A,bar,-0.695466,-1.83826,2013-10-01


#### pivot_table()

for pivoting with aggregation of numeric data.

In [25]:
pivoted = pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])
pivoted

Unnamed: 0_level_0,C,bar,foo
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
one,A,-0.919309,-0.294969
one,B,0.122931,-0.169763
one,C,1.337802,0.401795
three,A,-1.377278,
three,B,,0.289901
three,C,-0.293518,
two,A,,-1.728288
two,B,0.371681,
two,C,,0.644839


In [57]:
pivoted = df.pivot_table(index = "A", values=["D", "E"], aggfunc=["sum"])
pivoted

Unnamed: 0_level_0,sum,sum
Unnamed: 0_level_1,D,E
A,Unnamed: 1_level_2,Unnamed: 2_level_2
one,0.956974,-1.647092
three,-2.761791,0.419718
two,-1.423536,-0.900478
All,-3.228353,-2.127853


In [54]:
pivoted = df.pivot_table(values="D", index=pd.Grouper(freq="M", key="F"), columns="C")
print(pivoted)


C                bar       foo
F                             
2013-01-31       NaN -0.294969
2013-02-28       NaN -0.169763
2013-03-31       NaN  0.644839
2013-04-30 -1.377278       NaN
2013-05-31  0.122931       NaN
2013-06-30  1.337802       NaN
2013-07-31       NaN -1.728288
2013-08-31       NaN  0.289901
2013-09-30       NaN  0.401795
2013-10-31 -0.919309       NaN
2013-11-30  0.371681       NaN
2013-12-31 -0.293518       NaN


#### melt()
opposite of pivot, one or more columns are identifier variables, while all other columns, considered measured variables,

Parameters
- ignore_index=False
- var_name
- value_name



In [63]:
cheese = pd.DataFrame(
    {
        "first": ["John", "Mary"],
        "last": ["Doe", "Bo"],
        "height": [5.5, 6.0],
        "weight": [130, 150],
    }
)
print (cheese)
cheese.melt(id_vars=["first", "last"], var_name="attribute", value_name = "xy")

  first last  height  weight
0  John  Doe     5.5     130
1  Mary   Bo     6.0     150


Unnamed: 0,first,last,attribute,xy
0,John,Doe,height,5.5
1,Mary,Bo,height,6.0
2,John,Doe,weight,130.0
3,Mary,Bo,weight,150.0


In [67]:
df = pd.DataFrame({"key": list("bbacab"), "data1": range(6)})
print(df)

dm = pd.get_dummies(df["key"])


  key  data1
0   b      0
1   b      1
2   a      2
3   c      3
4   a      4
5   b      5


#### explode()

used to transform elements in a list-like object (such as a column containing lists) into separate rows while duplicating the values in the other columns.

In [81]:

data = {'ID': [1, 2, 3],
        'Items': [['apple', 'banana'], ['orange'], ['grape', 'apple', 'pear']]}

df = pd.DataFrame(data)
print(df)

df.explode('Items')


   ID                 Items
0   1       [apple, banana]
1   2              [orange]
2   3  [grape, apple, pear]


Unnamed: 0,ID,Items
0,1,apple
0,1,banana
1,2,orange
2,3,grape
2,3,apple
2,3,pear


#### crosstab()

 used to compute a cross-tabulation (also known as a contingency table) of two or more categorical variables. It allows you to see the frequency distribution of these variables in a tabular format, showing how the values of one variable relate to the values of another.

In [89]:
data = {'Gender': ['Male', 'Female', 'Male', 'Male', 'Female'],
        'Education': ['High School', 'College', 'College', 'High School', 'Graduate']}

df = pd.DataFrame(data)
print (df)

# Create a cross-tabulation
cross_tab = pd.crosstab(index = df['Gender'], columns=df['Education'])
cross_tab

   Gender    Education
0    Male  High School
1  Female      College
2    Male      College
3    Male  High School
4  Female     Graduate


Education,College,Graduate,High School
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,1,1,0
Male,1,0,2
