# 10 minutes to pandas
 파이썬 데이터 처리를 위한 라이브러리, 판다스(Pandas)를 스터디합니다.
 
[10 Minuts to Pandas 참조](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html)
---
**Contents**
1. Object Creation
2. Viewing Data
3. Selection
4. Missing Data
5. Operation
6. Merge
7. Grouping
8. Reshaping
9. Time Series
10. Cataegoricals
11. Plotting
12. Getting Data In / Out
13. Gotchas

In [None]:
# Custimarily, we import as follows
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## 1. Object Creation

Creating a **Series** by passing a list of values, letting pandas create a default integer index.

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a **DataFrame** by passing a NumPy array, with a datetime index and labeled columns.

In [None]:
dates = pd.date_range("20210101", periods=6)
dates

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06'],
              dtype='datetime64[ns]', freq='D')

In [None]:
df = pd.DataFrame(
    data=np.random.randn(6, 4),
    index=dates,
    columns=list("ABCD")
)
df

Unnamed: 0,A,B,C,D
2021-01-01,0.138698,1.296307,1.493235,0.770369
2021-01-02,-1.51748,-2.266206,-1.423929,1.206131
2021-01-03,-0.582158,1.550855,0.179526,0.088367
2021-01-04,0.266938,-0.625908,-1.592992,-1.030203
2021-01-05,-2.414502,-0.593594,0.009038,-0.061283
2021-01-06,1.058445,0.025756,0.888988,0.287454


Creating a **DataFrame** by passing a dict of objects that can be converted to series-like.

In [None]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20210101"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2021-01-01,1.0,3,test,foo
1,1.0,2021-01-01,1.0,3,train,foo
2,1.0,2021-01-01,1.0,3,test,foo
3,1.0,2021-01-01,1.0,3,train,foo


The columns of the resulting **DataFrame** have different **dtypes**.

In [None]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

## 2. Viewing Data

Here is how to view the top and bottom rows of the frame.

In [None]:
df.head()

Unnamed: 0,A,B,C,D
2021-01-01,0.138698,1.296307,1.493235,0.770369
2021-01-02,-1.51748,-2.266206,-1.423929,1.206131
2021-01-03,-0.582158,1.550855,0.179526,0.088367
2021-01-04,0.266938,-0.625908,-1.592992,-1.030203
2021-01-05,-2.414502,-0.593594,0.009038,-0.061283


In [None]:
df.tail(3)

Unnamed: 0,A,B,C,D
2021-01-04,0.266938,-0.625908,-1.592992,-1.030203
2021-01-05,-2.414502,-0.593594,0.009038,-0.061283
2021-01-06,1.058445,0.025756,0.888988,0.287454


Display the index, columns.

In [None]:
df.index

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06'],
              dtype='datetime64[ns]', freq='D')

In [None]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

## 3. Selection
## 4. Missing Data
## 5. Operation
## 6. Merge
## 7. Grouping
## 8. Reshaping
## 9. Time Series
## 10. Cataegoricals
## 11. Plotting
## 12. Getting Data In / Out
## 13. Gotchas