## DataFrame
- Series는 1차원 구조
- DataFrame은 2차원 구조
  - axis = 0 : index 방향
  - axis = 1 : columns 방향
  - 2개 이상의 Series로 구성
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

## DataFrame 생성

In [1]:
import pandas as pd

In [5]:
member = {
    'Attack': [111, 222, 333],
    'Defence': [444, 555, 666],
    'Luck': [777, 888, 999]
}


In [7]:
member_df=pd.DataFrame(member)
member_df

Unnamed: 0,Attack,Defence,Luck
0,111,444,777
1,222,555,888
2,333,666,999


## 열(Columns) 접근

In [8]:
member_df['Attack']

0    111
1    222
2    333
Name: Attack, dtype: int64

In [10]:
type(member_df['Attack'])

pandas.core.series.Series

열 접근을 하면 Series가 반환된다. 

## DataFrame - columns
- DataFrame의 인자 정보를 확인합시다.
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

In [12]:
# 리스트로 동일한 데이터 프레임을 만든다면 1
data = [[111, 222, 333], [444, 555, 666], [777, 888, 999]]
columns = ['Attack', 'Defence', 'Luck']
member_df=pd.DataFrame(data=data, columns=columns)
member_df

Unnamed: 0,Attack,Defence,Luck
0,111,222,333
1,444,555,666
2,777,888,999


실패했다! 다시 만들어보자.

In [13]:
# 리스트로 동일한 데이터 프레임을 만든다면 2
data = [[111, 444, 777], [222, 555, 888], [333, 666, 999]]
columns = ['Attack', 'Defence', 'Luck']
member_df=pd.DataFrame(data=data, columns=columns)
member_df

Unnamed: 0,Attack,Defence,Luck
0,111,444,777
1,222,555,888
2,333,666,999


In [17]:
# 미리보기
member_df.columns, member_df.index


(Index(['Attack', 'Defence', 'Luck'], dtype='object'),
 RangeIndex(start=0, stop=3, step=1))

> **columns : Index or array-like**  
Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.

---
**columns**
- 데이터가 제공되면 그것으로 columns로 설정
- 제공되지 않으면 default는 RangeIndex(0, 1, 2...n)로 설정

## DataFrame - index

In [19]:
# 리스트로 동일한 데이터 프레임을 만든다면 2
data = [[111, 444, 777], [222, 555, 888], [333, 666, 999]]
columns = ['Attack', 'Defence', 'Luck']
index = ['Spencer', 'Tommy', 'Uriel']

member_df=pd.DataFrame(data=data, columns=columns, index=index)
member_df

Unnamed: 0,Attack,Defence,Luck
Spencer,111,444,777
Tommy,222,555,888
Uriel,333,666,999


> **index : Index or array-like**
Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
---
**index**
- 데이터가 제공되면 그것으로 index로 설정
- 제공되지 않으면 default는 RangeIndex(0, 1, 2...n)로 설정

In [20]:
# 권장하는 방식
member_df['Attack']

Spencer    111
Tommy      222
Uriel      333
Name: Attack, dtype: int64

In [21]:
# 가능은 한데.. 추천하지 않는 방식. .(dot)을 이용한 접근 방식 - 문제가 발생할 수 있음.
member_df.Attack

Spencer    111
Tommy      222
Uriel      333
Name: Attack, dtype: int64

## DataFrame의 속성(Attr, attr, attributes)
- dot(.)을 통해 접근하며 활용할 수 있는 속성을 확인해보자

## .shape

In [22]:
member_df.shape

(3, 3)

.shape는 행렬 크기(모양)을 알 수 있다.

## [생각해보기] 만일 열이 shape였다면?

In [24]:
df1=pd.DataFrame([[1,2,3],[4,5,6]],columns=['shape', 'info','index'])
df1                                    

Unnamed: 0,shape,info,index
0,1,2,3
1,4,5,6


In [25]:
df1['shape']

0    1
1    4
Name: shape, dtype: int64

In [27]:
#시리즈조회 x 행렬 크기가 조회
df1.shape

(2, 3)

In [28]:
df1.info

<bound method DataFrame.info of    shape  info  index
0      1     2      3
1      4     5      6>

In [29]:
df1.index

RangeIndex(start=0, stop=2, step=1)

이전 예시 `member_df.Attack`처럼 dot을 통해 접근되지 않는다.

## csv로 DataFrame 생성

In [31]:
df2=pd.read_csv('TopRichestInWorld.csv')
df2

Unnamed: 0,Name,NetWorth,Age,Country/Territory,Source,Industry
0,Elon Musk,"$219,000,000,000",50,United States,"Tesla, SpaceX",Automotive
1,Jeff Bezos,"$171,000,000,000",58,United States,Amazon,Technology
2,Bernard Arnault & family,"$158,000,000,000",73,France,LVMH,Fashion & Retail
3,Bill Gates,"$129,000,000,000",66,United States,Microsoft,Technology
4,Warren Buffett,"$118,000,000,000",91,United States,Berkshire Hathaway,Finance & Investments
...,...,...,...,...,...,...
96,Vladimir Potanin,"$17,300,000,000",61,Russia,metals,Metals & Mining
97,Harold Hamm & family,"$17,200,000,000",76,United States,oil & gas,Energy
98,Sun Piaoyang,"$17,100,000,000",63,China,pharmaceuticals,Healthcare
99,Luo Liguo & family,"$17,000,000,000",66,China,chemicals,Manufacturing


In [32]:
# 테스트
df2['Name']

0                     Elon Musk
1                    Jeff Bezos
2      Bernard Arnault & family
3                    Bill Gates
4                Warren Buffett
                 ...           
96             Vladimir Potanin
97         Harold Hamm & family
98                 Sun Piaoyang
99           Luo Liguo & family
100                   Peter Woo
Name: Name, Length: 101, dtype: object

In [33]:
# 단일 열 접근 -> Series
type(df2['Name'])

pandas.core.series.Series

In [37]:
# 미리보기 : 다중 열 접근 -> DataFrame 이중리스트
df2[['Name', 'Age']]
df2

Unnamed: 0,Name,NetWorth,Age,Country/Territory,Source,Industry
0,Elon Musk,"$219,000,000,000",50,United States,"Tesla, SpaceX",Automotive
1,Jeff Bezos,"$171,000,000,000",58,United States,Amazon,Technology
2,Bernard Arnault & family,"$158,000,000,000",73,France,LVMH,Fashion & Retail
3,Bill Gates,"$129,000,000,000",66,United States,Microsoft,Technology
4,Warren Buffett,"$118,000,000,000",91,United States,Berkshire Hathaway,Finance & Investments
...,...,...,...,...,...,...
96,Vladimir Potanin,"$17,300,000,000",61,Russia,metals,Metals & Mining
97,Harold Hamm & family,"$17,200,000,000",76,United States,oil & gas,Energy
98,Sun Piaoyang,"$17,100,000,000",63,China,pharmaceuticals,Healthcare
99,Luo Liguo & family,"$17,000,000,000",66,China,chemicals,Manufacturing


In [38]:
type(df2[['Name', 'Age']])

pandas.core.frame.DataFrame