### [참고] <a href="https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf">Pandas Cheat Sheet</a>

### DataFrame (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)

<img src="https://miro.medium.com/max/1059/1*5zJ9tsVIRvxY83GsO8eyOw.png" width="500" height="350">

**pd.DataFrame(data=None,index: Union[Collection, NoneType] = None, columns: Union[Collection, NoneType] = None,  dtype: Union[str, numpy.dtype, ForwardRef('ExtensionDtype'), NoneType] = None,   copy: bool = False)**

- 데이터프레임은 테이블형(2차원) 데이터이며, 데이터 분석/머신 러닝에서 데이터 처리를 위해 주로 사용됨
- 2차원이기 때문에 엑셀/csv와 같이 데이터가 row, column로 구성되며, 인덱스도 두 개, row/column 각각 존재함
  - 행의 레이블은 인덱스로, 열의 레이블은 컬럼으로 부름

In [1]:
import pandas as pd

### 생성
#### 1) 딕셔너리로 생성

In [2]:
dict1 = [
    {
        "name" : "John",
        "age" : 25,
        "job" : "student"
    },
    {
        "name" : "Nate",
        "age" : 34,
        "job" : "teacher"
    },
    {
        "name" : "Jenny",
        "age" : 30,
        "job" : "developer"
    },
]

In [3]:
df1 = pd.DataFrame(dict1)
df1

Unnamed: 0,name,age,job
0,John,25,student
1,Nate,34,teacher
2,Jenny,30,developer


In [4]:
dict2 = {"국어":[15,25,35], "영어":[45,55,65], "수학":[75,85,95]}
student_df = pd.DataFrame(dict2)
student_df

Unnamed: 0,국어,영어,수학
0,15,45,75
1,25,55,85
2,35,65,95


><b>index 넣어서 생성</b>

In [5]:
df1 = pd.DataFrame(dict1,index=["f1","f2","f3"])
df1

Unnamed: 0,name,age,job
f1,John,25,student
f2,Nate,34,teacher
f3,Jenny,30,developer


In [6]:
student_df = pd.DataFrame(dict2, index=["1월","2월","3월"])
student_df

Unnamed: 0,국어,영어,수학
1월,15,45,75
2월,25,55,85
3월,35,65,95


#### 2) 이차원 리스트로 생성

In [7]:
list1 = [
    [1,2,3,4,5],
    [6,7,8,9,10]
]

two_df = pd.DataFrame(list1, index=["내용1","내용2"], columns=["c1","c2","c3","c4","c5"])
two_df

Unnamed: 0,c1,c2,c3,c4,c5
내용1,1,2,3,4,5
내용2,6,7,8,9,10


#### 3) csv 파일로 생성

In [10]:
df2 = pd.read_csv("./data/sample3.csv", header=None, encoding="cp949")
df2

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15


In [11]:
df2 = pd.read_csv("./data/sample3.csv", header=None, encoding="cp949",names=["c1","c2","c3"])
df2

Unnamed: 0,c1,c2,c3
0,1,2,3
1,4,5,6
2,7,8,9
3,10,11,12
4,13,14,15


In [13]:
# sample1.csv

df2 = pd.read_csv("./data/sample1.csv",encoding="cp949")
df2

Unnamed: 0,번호,이름,가입일시,나이
0,1,김정수,2017-01-19 11:30:00,25
1,2,박민구,2017-02-07 10:22:00,35
2,3,정순미,2017-01-22 09:10:00,33
3,4,김정현,2017-02-22 14:09:00,45
4,5,홍미진,2017-04-01 18:00:00,17
5,6,김순철,2017-05-14 22:33:07,22
6,7,이동철,2017-03-01 23:44:45,27
7,8,박지숙,2017-01-11 06:04:18,30
8,9,김은미,2017-02-08 07:44:33,51
9,10,장혁철,2017-12-01 13:01:11,16


In [16]:
df3 = pd.read_csv("./data/sample2.csv",encoding="cp949",delimiter="|",index_col=0)
df3

Unnamed: 0_level_0,이름,가입일시,나이
번호,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,김정수,2017-01-19 11:30:00,25
2,박민구,2017-02-07 10:22:00,35
3,정순미,2017-01-22 09:10:00,33
4,김정현,2017-02-22 14:09:00,45
5,홍미진,2017-04-01 18:00:00,17
6,김순철,2017-05-14 22:33:07,22
7,이동철,2017-03-01 23:44:45,27
8,박지숙,2017-01-11 06:04:18,30
9,김은미,2017-02-08 07:44:33,51
10,장혁철,2017-12-01 13:01:11,16


#### 4) excel 파일로 생성

In [18]:
# 타이타닉

train_df = pd.read_excel("./data/train.xlsx")
train_df.head(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [19]:
train_df.tail(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [20]:
sample_df = pd.read_excel("./data/sample.xlsx")
sample_df

Unnamed: 0,Sap Co.,대리점,영업사원,전월,금월,TEAM,총 판매수량
0,KI1316,경기수원대리점,이기정,1720000,2952000,1,123
1,KI1451,충청홍성대리점,정미진,4080000,2706000,2,220
2,KI1534,경기화성대리점,경인선,600000,2214000,1,320
3,KI1636,강원속초대리점,이동권,3720000,2870000,3,110
4,KI1735,경기안양대리점,강준석,4800000,2296000,1,134
5,KI1875,제주제주시대리점,민경수,3420000,2346000,4,210
6,KI1917,경기광주대리점,김진혜,1292000,1518000,1,110
7,KI2032,경기평택대리점,고유정,2736000,2139000,2,90
8,KI2153,경기의정부대리점,김은향,1368000,2484000,1,183
9,KI2214,경기성남대리점,이준수,1976000,1518000,3,73


### 조회

In [21]:
df1.index

Index(['f1', 'f2', 'f3'], dtype='object')

In [22]:
student_df.index

Index(['1월', '2월', '3월'], dtype='object')

In [23]:
train_df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Gender', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [24]:
student_df.values

array([[15, 45, 75],
       [25, 55, 85],
       [35, 65, 95]], dtype=int64)

In [25]:
train_df.dtypes

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Gender          object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

### 삭제
* 수정과 삭제는 해당 dataframe에 적용을 하지 않으면 반영 안됨
* 직접 반영을 위해 inplace=True 사용/ df에 직접 변경을 원하지 않으면 새로운 변수에 담아서 사용하기

In [35]:
df1 = pd.DataFrame(dict1)
df1

Unnamed: 0,name,age,job
0,John,25,student
1,Nate,34,teacher
2,Jenny,30,developer


In [29]:
# inplace=True : 삭제 바로 반영

df1.drop([0,2],inplace=True)

In [30]:
df1

Unnamed: 0,name,age,job
1,Nate,34,teacher


In [34]:
# age 가 25 인 행

df1 = df1[df1.age == 25]
df1

Unnamed: 0,name,age,job
0,John,25,student


In [37]:
# axis : 축 지정
# 1 (컬럼)

df1.drop("age",axis=1)

Unnamed: 0,name,job
0,John,student
1,Nate,teacher
2,Jenny,developer


In [38]:
df1.drop(columns="age")

Unnamed: 0,name,job
0,John,student
1,Nate,teacher
2,Jenny,developer


### 수정

#### 1)  컬럼명 수정

#### 2) 인덱스 수정

#### 3) 컬럼 추가

#### 4) 행 추가