---
jupyter: python3
toc: true
toc-depth: 3
toc-expand: true
number-sections: true
title: Pandas_06_데이터 타입
date: 2021-11-05 00:06
categories: pandas
author: limyj0708
comments:
  giscus:
    repo: limyj0708/blog
format:
    html:
        page-layout: full
---

In [1]:
import pandas as pd
import numpy as np
import copy
from IPython.display import display_html, display

In [2]:
def display_multiple_dfs(dfs:list, styles, margin=10):
    display_target = ''
    for each_df in dfs:
        each_df_html = each_df[0].style.set_caption(f'<b>{each_df[1]}</b>').set_table_styles(styles).set_table_attributes(f"style='display:inline;margin:{margin}px'")._repr_html_()
        display_target += each_df_html
    display_html(display_target, raw = True)

In [3]:
styles = [
    {"selector" : "caption", "props" : "text-align:center; font-size:16px"}
]

# dtypes : 컬럼들의 type 출력

In [4]:
df = pd.DataFrame({'float': [1.0],
                   'int': [1],
                   'datetime': [pd.Timestamp('20180310')],
                   'string': ['foo']})
df.dtypes
# 더 이상의 설명은 필요 없다!

float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object

# select_dtypes : 특정 타입의 컬럼을 선택, 혹은 배제

`DataFrame.select_dtypes(include=None, exclude=None)`

- To select all numeric types, use np.number or 'number'
- To select strings you must use the object dtype, but note that this will return all object dtype columns See the numpy dtype hierarchy
- To select datetimes, use np.datetime64, 'datetime' or 'datetime64'
- To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'
- To select Pandas categorical dtypes, use 'category'
- To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or 'datetime64[ns, tz]'

In [5]:
df = pd.DataFrame({'a': [1, 2] * 3,
                   'b': [True, False] * 3,
                   'c': [1.0, 2.0] * 3})
df

Unnamed: 0,a,b,c
0,1,True,1.0
1,2,False,2.0
2,1,True,1.0
3,2,False,2.0
4,1,True,1.0
5,2,False,2.0


In [6]:
df.select_dtypes(include='bool')

Unnamed: 0,b
0,True
1,False
2,True
3,False
4,True
5,False


In [7]:
df.select_dtypes(include=['float64'])

Unnamed: 0,c
0,1.0
1,2.0
2,1.0
3,2.0
4,1.0
5,2.0


In [8]:
df.select_dtypes(exclude=['int64'])

Unnamed: 0,b,c
0,True,1.0
1,False,2.0
2,True,1.0
3,False,2.0
4,True,1.0
5,False,2.0


# astype : 타입 변경. Bigquery에 df 업로드 시 반드시 사용
`DataFrame.astype(dtype, copy=True, errors='raise')`

- copy : False를 하면, 복사를 하는 게 아니고 원본에 연결되므로 변경사항이 원본에까지 전파됨
- errors : ignore로 세팅하면, 에러 발생 시 원본을 반환하고 끝냄

In [9]:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.dtypes

col1    int64
col2    int64
dtype: object

In [10]:
df.astype({'col1': 'int32'}).dtypes
# 잘 변경됐습니다~

col1    int32
col2    int64
dtype: object