- `pivot()` and `pivot_table()`: Group unique values within one or more discrete categories.
- `melt()`: Unpivot a wide DataFrame to a long format.
- `get_dummies()`:Conversions with indicator variables.
- `explode()`: Convert a column of list-like values to individual rows.
- `crosstab()`: Calculate a cross-tabulation of multiple 1 dimensional factor arrays.

#### pivot()

In [106]:
import numpy as np
import pandas as pd
import datetime as dt

data = {
 "value": range(12),
 "variable": ["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3,
 "date": pd.to_datetime(["2020-01-03", "2020-01-04", "2020-01-05"] * 4)
}

df = pd.DataFrame(data)
df

Unnamed: 0,value,variable,date
0,0,A,2020-01-03
1,1,A,2020-01-04
2,2,A,2020-01-05
3,3,B,2020-01-03
4,4,B,2020-01-04
5,5,B,2020-01-05
6,6,C,2020-01-03
7,7,C,2020-01-04
8,8,C,2020-01-05
9,9,D,2020-01-03


In [107]:
pivoted = df.pivot(index="date", columns="variable", values="value")
pivoted

variable,A,B,C,D
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-03,0,3,6,9
2020-01-04,1,4,7,10
2020-01-05,2,5,8,11


In [110]:
df["value2"] = df["value"] * 10
print (df)
pivoted = df.pivot(index="date", columns="variable")
pivoted["value2"]

    value variable       date  value2
0       0        A 2020-01-03       0
1       1        A 2020-01-04      10
2       2        A 2020-01-05      20
3       3        B 2020-01-03      30
4       4        B 2020-01-04      40
5       5        B 2020-01-05      50
6       6        C 2020-01-03      60
7       7        C 2020-01-04      70
8       8        C 2020-01-05      80
9       9        D 2020-01-03      90
10     10        D 2020-01-04     100
11     11        D 2020-01-05     110


variable,A,B,C,D
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-03,0,30,60,90
2020-01-04,10,40,70,100
2020-01-05,20,50,80,110


In [111]:
df = pd.DataFrame(
    {
        "A": ["one", "one", "two", "three"] * 6,
        "B": ["A", "B", "C"] * 8,
        "C": ["foo", "foo", "foo", "bar", "bar", "bar"] * 4,
        "D": np.random.randn(24),
        "E": np.random.randn(24),
        "F": [dt.datetime(2013, i, 1) for i in range(1, 13)]
        + [dt.datetime(2013, i, 15) for i in range(1, 13)],
    }
)
df

Unnamed: 0,A,B,C,D,E,F
0,one,A,foo,0.376349,-0.325739,2013-01-01
1,one,B,foo,1.64804,1.480065,2013-02-01
2,two,C,foo,-0.682708,-0.640858,2013-03-01
3,three,A,bar,-0.788253,-1.26112,2013-04-01
4,one,B,bar,0.347424,0.986559,2013-05-01
5,one,C,bar,0.609088,-0.044883,2013-06-01
6,two,A,foo,-0.549947,-0.419775,2013-07-01
7,three,B,foo,0.470777,-1.342602,2013-08-01
8,one,C,foo,1.223613,-2.096644,2013-09-01
9,one,A,bar,-1.260162,-0.570821,2013-10-01


#### pivot_table()

for pivoting with aggregation of numeric data.

In [97]:
pivoted = pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])
pivoted

Unnamed: 0_level_0,C,bar,foo
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
one,A,-0.778522,-0.959167
one,B,-0.552891,0.815439
one,C,-0.518121,-0.598301
three,A,0.745308,
three,B,,-1.470399
three,C,0.532181,
two,A,,-0.514069
two,B,1.069973,
two,C,,-0.133152


In [113]:
pivoted = df.pivot_table(index = "A", values=["D", "E"], aggfunc={"D":"sum", "E":"mean"})
pivoted

Unnamed: 0_level_0,D,E
A,Unnamed: 1_level_1,Unnamed: 2_level_1
one,4.217044,0.119502
three,-0.446635,-0.551363
two,0.719375,-0.581066


In [115]:
print (df)
pivoted = df.pivot_table(values="D", index=pd.Grouper(freq="M", key="F"), columns="C")
print(pivoted)


        A  B    C         D         E          F
0     one  A  foo  0.376349 -0.325739 2013-01-01
1     one  B  foo  1.648040  1.480065 2013-02-01
2     two  C  foo -0.682708 -0.640858 2013-03-01
3   three  A  bar -0.788253 -1.261120 2013-04-01
4     one  B  bar  0.347424  0.986559 2013-05-01
5     one  C  bar  0.609088 -0.044883 2013-06-01
6     two  A  foo -0.549947 -0.419775 2013-07-01
7   three  B  foo  0.470777 -1.342602 2013-08-01
8     one  C  foo  1.223613 -2.096644 2013-09-01
9     one  A  bar -1.260162 -0.570821 2013-10-01
10    two  B  bar  0.014410 -0.223990 2013-11-01
11  three  C  bar -0.771930  0.694973 2013-12-01
12    one  A  foo  1.474043  1.094251 2013-01-15
13    one  B  foo -0.202011 -1.261479 2013-02-15
14    two  C  foo  0.602588 -0.968656 2013-03-15
15  three  A  bar -0.214734 -0.881009 2013-04-15
16    one  B  bar  0.474877  2.754774 2013-05-15
17    one  C  bar -0.775268  0.872788 2013-06-15
18    two  A  foo  2.360848 -0.370620 2013-07-15
19  three  B  foo  0

#### melt()
opposite of pivot, one or more columns are identifier variables, while all other columns, considered measured variables,

Parameters
- ignore_index=False
- var_name
- value_name

In [119]:
cheese = pd.DataFrame(
    {
        "first": ["John", "Mary"],
        "last": ["Doe", "Bo"],
        "height": [5.5, 6.0],
        "weight": [130, 150],
    }
)
print (cheese)
cheese.melt(id_vars=["first", "last"], ignore_index=False, value_name= "xy", var_name="attribute")

  first last  height  weight
0  John  Doe     5.5     130
1  Mary   Bo     6.0     150


Unnamed: 0,first,last,attribute,xy
0,John,Doe,height,5.5
1,Mary,Bo,height,6.0
0,John,Doe,weight,130.0
1,Mary,Bo,weight,150.0


In [104]:
df = pd.DataFrame({"key": list("bbacab"), "data1": range(6)})
print(df)

dm = pd.get_dummies(df["key"])
dm


  key  data1
0   b      0
1   b      1
2   a      2
3   c      3
4   a      4
5   b      5


Unnamed: 0,a,b,c
0,0,1,0
1,0,1,0
2,1,0,0
3,0,0,1
4,1,0,0
5,0,1,0


In [81]:

data = {'ID': [1, 2, 3],
        'Items': [['apple', 'banana'], ['orange'], ['grape', 'apple', 'pear']]}

df = pd.DataFrame(data)
print(df)

df.explode('Items')


   ID                 Items
0   1       [apple, banana]
1   2              [orange]
2   3  [grape, apple, pear]


Unnamed: 0,ID,Items
0,1,apple
0,1,banana
1,2,orange
2,3,grape
2,3,apple
2,3,pear


#### crosstab()

 used to compute a cross-tabulation (also known as a contingency table) of two or more categorical variables. It allows you to see the frequency distribution of these variables in a tabular format, showing how the values of one variable relate to the values of another.

In [89]:
data = {'Gender': ['Male', 'Female', 'Male', 'Male', 'Female'],
        'Education': ['High School', 'College', 'College', 'High School', 'Graduate']}

df = pd.DataFrame(data)
print (df)

# Create a cross-tabulation
cross_tab = pd.crosstab(index = df['Gender'], columns=df['Education'])
cross_tab

   Gender    Education
0    Male  High School
1  Female      College
2    Male      College
3    Male  High School
4  Female     Graduate


Education,College,Graduate,High School
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,1,1,0
Male,1,0,2
