## **Pandas** 자료구조
 - 판다스의 시리즈는 파이썬의 리스트와 비슷한 1차원 자료구조
 - 시리즈는 데이터 프레임의 각 열을 나타내는 자료형임
 - 한 열에 있는 모든 값은 자료형(dtype)이 같아야 함
 - 시리즈는 인덱스, 벨류와 같은 다양한 속성과 메서드를 제공함

 - 데이터프레임은 각 키가 열의 이름이고 값이 시리즈인 딕셔너리로 해석가능함
 - column에 해당하는 각 열에 자동으로 인덱스가 매겨짐
 - 데이터프레임은 시리즈 객체를 모은 딕셔너리 형태임

In [1]:
import pandas as pd

In [5]:
s = pd.Series(["Banana", 42])
print(s)

0    Banana
1        42
dtype: object


In [6]:
s1 = pd.Series(data = ["박채연", "취준생"], index = ['person', 'who'])
print(s1)

person    박채연
who       취준생
dtype: object


In [11]:
df1 = pd.DataFrame({
    "name" : ['홍길동', "김인정"],
    "생일": ["1950-08-02", "1995-04-08"],
    "사망날짜" : ["2002-05-15","2012-06-05"],
    "나이": [52, 17]
})

print(df1)

  name          생일        사망날짜  나이
0  홍길동  1950-08-02  2002-05-15  52
1  김인정  1995-04-08  2012-06-05  17


In [122]:
#인덱스 지정하기

df2 = pd.DataFrame(
    data={
        "Occupation": ["Chemist", "Statistician"],
        "Born": ["1920-07-25", "1876-06-13"],
        "Died": ["1958-04-16", "1937-10-16"],
        "Age": [37, 61],
    },
    index=["Rosaline Franklin", "William Gosset"],
    columns=["Occupation", "Born", "Died", "Age"],
)

print(df2)

                     Occupation        Born        Died  Age
Rosaline Franklin       Chemist  1920-07-25  1958-04-16   37
William Gosset     Statistician  1876-06-13  1937-10-16   61


---
### **시리즈 다루기**

In [125]:
first_row = df2.loc["Rosaline Franklin"]

In [126]:
print(first_row.index)
print(first_row.values)

Index(['Occupation', 'Born', 'Died', 'Age'], dtype='object')
['Chemist' '1920-07-25' '1958-04-16' np.int64(37)]


In [24]:
first_row.index[0]

'name'

In [25]:
first_row.keys() #열의 이름 모두 확인 가능

Index(['name', 'birth', 'death', 'age'], dtype='object')

In [26]:
first_row.keys()[0] #열 이름의 인덱스 지정

'name'

 - loc : 열 이름으로 데이터 추출
 - iloc : 열 위치로 데이터 추출
 - dtype 또는 dtypes : 시리즈에 저장된 값의 자료형 
 - T : 시리즈의 전치, 행과 열 전환 (transpose)
 - shape : 데이터의 차원
 - size : 시리즈 요소의 개수
 - values : 시리즈의 ndarray 또는 ndarray와 같은 형태


### **시리즈와 ndarray**
 - 판다스의 시리즈 자료구조는 넘파이의 ndarray와 매우 닮음
 - ndarray에서 사용가능한 메서드/함수는 시리즈에도 대부분 사용 가능
 - 한 특성에 대한 여러가지 값이므로 시리즈를 벡터라고도 함

 - 넘파이는 숫자 벡터를 다루는 과학 계산 라이브러리
 - 시리즈는 넘파이의 ndarray를 확장한 개념 -> 많은 속성과 메서드 그대로 사용 가능
 - .mean()   .min()   .max()   .std()
 - https://numpy.org/doc/stable/reference/arrays.ndarray.html

In [31]:
scientists = pd.read_csv('codes/data/scientists.csv', sep=',')
print(scientists)

                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician


In [131]:
ages = scientists["Age"]  # 나이 열만 추출
print(ages)

0    61
1    45
2    37
3    77
4    90
5    56
6    66
7    41
Name: Age, dtype: int64


In [136]:
print(ages.describe()) #기술 통계량 확인

count     8.000000
mean     59.125000
std      18.325918
min      37.000000
25%      44.000000
50%      58.500000
75%      68.750000
max      90.000000
Name: Age, dtype: float64


In [37]:
ages[ages > ages.mean()] # 평균값보다 큰 age값만 출력

1    61
2    90
3    66
7    77
Name: Age, dtype: int64

In [39]:
print(ages > ages.mean()) #비교문이라서 불린값 반환

0    False
1     True
2     True
3     True
4    False
5    False
6    False
7     True
Name: Age, dtype: bool


- 불린값으로 구성된 시리즈(벡터)로도 데이터를 추출 가능함

In [44]:
manual_bool_type = ages > ages.mean()
manual_bool_type #불린 값 저장

0    False
1     True
2     True
3     True
4    False
5    False
6    False
7     True
Name: Age, dtype: bool

In [45]:
print(ages[manual_bool_type]) #불린값이 T인 것만 리턴

1    61
2    90
3    66
7    77
Name: Age, dtype: int64


### **브로드 캐스팅** 
 - for문 없이 ages > ages.mean()로 모든 데이터를 반환 가능한 이유
 - 시리즈/데이터프레임을 대상으로 하는 메서드는 모든 데이터를 대상으로 연산하는 **브로드 캐스팅**하기 때문
   -> 코드의 가독성 증가와  최적화 효과
---
 - 벡터와 벡터, 벡터와 스칼라 계산하기
 -  길이가 같은 두 시리즈를 대상으로 계산하기
 - **null값은 계산되지 않으므로** 계산 전 null 값 처리가 중요 !

In [46]:
ages + ages

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

In [48]:
ages * ages

0    1369
1    3721
2    8100
3    4356
4    3136
5    2025
6    1681
7    5929
Name: Age, dtype: int64

In [49]:
ages + 100

0    137
1    161
2    190
3    166
4    156
5    145
6    141
7    177
Name: Age, dtype: int64

In [50]:
ages*2

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

In [51]:
import numpy as np

판다스의 데이터는 자동으로 정렬됨

In [58]:
rev_ages = ages.sort_index(ascending = False)  #내림차순 정렬
print(rev_ages)

7    77
6    41
5    45
4    56
3    66
2    90
1    61
0    37
Name: Age, dtype: int64


In [59]:
print(ages + rev_ages) # 행 순서대로 더해진게 아니고 인덱스 기준으로 더해짐 !!

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64


---
- 데이터 프레임의 요소에는 index, column, value가 존재함

In [60]:
scientists.index

RangeIndex(start=0, stop=8, step=1)

In [61]:
scientists.columns

Index(['Name', 'Born', 'Died', 'Age', 'Occupation'], dtype='object')

In [62]:
scientists.values

array([['Rosaline Franklin', '1920-07-25', '1958-04-16', 37, 'Chemist'],
       ['William Gosset', '1876-06-13', '1937-10-16', 61, 'Statistician'],
       ['Florence Nightingale', '1820-05-12', '1910-08-13', 90, 'Nurse'],
       ['Marie Curie', '1867-11-07', '1934-07-04', 66, 'Chemist'],
       ['Rachel Carson', '1907-05-27', '1964-04-14', 56, 'Biologist'],
       ['John Snow', '1813-03-15', '1858-06-16', 45, 'Physician'],
       ['Alan Turing', '1912-06-23', '1954-06-07', 41,
        'Computer Scientist'],
       ['Johann Gauss', '1777-04-30', '1855-02-23', 77, 'Mathematician']],
      dtype=object)

데이터프레임과 불리언 추출

In [63]:
print(scientists.loc[scientists["Age"]> scientists["Age"].mean()])

                   Name        Born        Died  Age     Occupation
1        William Gosset  1876-06-13  1937-10-16   61   Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90          Nurse
3           Marie Curie  1867-11-07  1934-07-04   66        Chemist
7          Johann Gauss  1777-04-30  1855-02-23   77  Mathematician


In [64]:
df = pd.read_csv('codes/data/gapminder.tsv', sep='\t')

In [66]:
df[5:20:2] #슬라이싱

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
5,Afghanistan,Asia,1977,38.438,14880372,786.11336
7,Afghanistan,Asia,1987,40.822,13867957,852.395945
9,Afghanistan,Asia,1997,41.763,22227415,635.341351
11,Afghanistan,Asia,2007,43.828,31889923,974.580338
13,Albania,Europe,1957,59.28,1476505,1942.284244
15,Albania,Europe,1967,66.22,1984060,2760.196931
17,Albania,Europe,1977,68.93,2509048,3533.00391
19,Albania,Europe,1987,72.0,3075321,3738.932735


In [72]:
first_half = scientists[:4]
second_half = scientists[4:]

print(first_half)
print("-" * 100)
print(second_half)

                   Name        Born        Died  Age    Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37       Chemist
1        William Gosset  1876-06-13  1937-10-16   61  Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90         Nurse
3           Marie Curie  1867-11-07  1934-07-04   66       Chemist
----------------------------------------------------------------------------------------------------
            Name        Born        Died  Age          Occupation
4  Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5      John Snow  1813-03-15  1858-06-16   45           Physician
6    Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7   Johann Gauss  1777-04-30  1855-02-23   77       Mathematician


In [70]:
print(scientists * 2) #문자열의 경우 두번 출력, int형의 경우 곱하기 연산


                                       Name                  Born  \
0        Rosaline FranklinRosaline Franklin  1920-07-251920-07-25   
1              William GossetWilliam Gosset  1876-06-131876-06-13   
2  Florence NightingaleFlorence Nightingale  1820-05-121820-05-12   
3                    Marie CurieMarie Curie  1867-11-071867-11-07   
4                Rachel CarsonRachel Carson  1907-05-271907-05-27   
5                        John SnowJohn Snow  1813-03-151813-03-15   
6                    Alan TuringAlan Turing  1912-06-231912-06-23   
7                  Johann GaussJohann Gauss  1777-04-301777-04-30   

                   Died  Age                            Occupation  
0  1958-04-161958-04-16   74                        ChemistChemist  
1  1937-10-161937-10-16  122              StatisticianStatistician  
2  1910-08-131910-08-13  180                            NurseNurse  
3  1934-07-041934-07-04  132                        ChemistChemist  
4  1964-04-141964-04-14  112     

In [80]:
df1 = df2 = pd.DataFrame(data = [[1,2,3], [4,5,6], [7,8,9]]) # 3행 3열
df1

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


In [81]:
df_added = df1.add(df2) #행렬의 구조가 같으니 더하기가 가능함
df_added

Unnamed: 0,0,1,2
0,2,4,6
1,8,10,12
2,14,16,18


In [83]:
df_sub = df1.sub(df2)
df_sub

Unnamed: 0,0,1,2
0,0,0,0
1,0,0,0
2,0,0,0


---
시리즈와 데이터프레임 데이터 변환하기

In [84]:
scientists.dtypes

Name          object
Born          object
Died          object
Age            int64
Occupation    object
dtype: object

In [88]:
born_datetime = pd.to_datetime(scientists["Born"], format = "%Y-%m-%d")
print(born_datetime)

0   1920-07-25
1   1876-06-13
2   1820-05-12
3   1867-11-07
4   1907-05-27
5   1813-03-15
6   1912-06-23
7   1777-04-30
Name: Born, dtype: datetime64[ns]


In [90]:
died_datetime = pd.to_datetime(scientists["Died"], format = "%Y-%m-%d")
print(died_datetime)

0   1958-04-16
1   1937-10-16
2   1910-08-13
3   1934-07-04
4   1964-04-14
5   1858-06-16
6   1954-06-07
7   1855-02-23
Name: Died, dtype: datetime64[ns]


In [100]:
scientists["born_dt"], scientists["died_dt"] =born_datetime, died_datetime

In [137]:
print(scientists.head())

                   Name        Born        Died  Age    Occupation    born_dt  \
0     Rosaline Franklin  1920-07-25  1958-04-16   61       Chemist 1920-07-25   
1        William Gosset  1876-06-13  1937-10-16   45  Statistician 1876-06-13   
2  Florence Nightingale  1820-05-12  1910-08-13   37         Nurse 1820-05-12   
3           Marie Curie  1867-11-07  1934-07-04   77       Chemist 1867-11-07   
4         Rachel Carson  1907-05-27  1964-04-14   90     Biologist 1907-05-27   

     died_dt   age_days                  age_years age_days_assign  \
0 1958-04-16 13779 days 37 days 18:00:59.178082191      13779 days   
1 1937-10-16 22404 days 61 days 09:08:23.013698630      22404 days   
2 1910-08-13 32964 days 90 days 07:29:45.205479452      32964 days   
3 1934-07-04 24345 days 66 days 16:46:01.643835616      24345 days   
4 1964-04-14 20777 days 56 days 22:09:32.054794520      20777 days   

   age_year_assign  
0             37.0  
1             61.0  
2             90.0  
3       

In [138]:
print(scientists.shape)

(8, 11)


In [139]:
print(scientists.dtypes)

Name                        object
Born                        object
Died                        object
Age                          int64
Occupation                  object
born_dt             datetime64[ns]
died_dt             datetime64[ns]
age_days           timedelta64[ns]
age_years          timedelta64[ns]
age_days_assign    timedelta64[ns]
age_year_assign            float64
dtype: object


---
열 내용 변환하기

In [93]:
print(scientists["Age"])

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64


In [141]:
#열 내용 무작위로 섞기
print(scientists["Age"].sample(frac = 1, random_state = 42)) 
 #sample : 무작위로 시리즈의 값을 추출, 100% 전체 값을 추출한다는 의미로 1 입력
    # random_state 는 컴퓨터가 생성하는 난수의 기준값을 정하는 역할

1    45
5    56
0    61
7    41
2    37
4    90
3    77
6    66
Name: Age, dtype: int64


In [96]:
scientists["Age"] = scientists["Age"].sample(frac = 1, random_state = 42)
print(scientists["Age"])

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64


In [97]:
scientists["Age"] = scientists["Age"].sample(frac = 1, random_state = 42).values
print(scientists["Age"])

0    61
1    45
2    37
3    77
4    90
5    56
6    66
7    41
Name: Age, dtype: int64


In [101]:
scientists["age_days"] = scientists["died_dt"] - scientists["born_dt"]

In [104]:
scientists["age_years"]  = scientists["age_days"] /365

---
assign()으로 열 수정하기

In [142]:
scientists = scientists.assign(
    age_days_assign = scientists["died_dt"] - scientists["born_dt"],
    age_year_assign =  (scientists["age_days"].dt.days / 365).apply(np.floor)  
)         #날짜 수를 햇수로 변환
print(scientists)

                   Name        Born        Died  Age          Occupation  \
0     Rosaline Franklin  1920-07-25  1958-04-16   61             Chemist   
1        William Gosset  1876-06-13  1937-10-16   45        Statistician   
2  Florence Nightingale  1820-05-12  1910-08-13   37               Nurse   
3           Marie Curie  1867-11-07  1934-07-04   77             Chemist   
4         Rachel Carson  1907-05-27  1964-04-14   90           Biologist   
5             John Snow  1813-03-15  1858-06-16   56           Physician   
6           Alan Turing  1912-06-23  1954-06-07   66  Computer Scientist   
7          Johann Gauss  1777-04-30  1855-02-23   41       Mathematician   

     born_dt    died_dt   age_days                  age_years age_days_assign  \
0 1920-07-25 1958-04-16 13779 days 37 days 18:00:59.178082191      13779 days   
1 1876-06-13 1937-10-16 22404 days 61 days 09:08:23.013698630      22404 days   
2 1820-05-12 1910-08-13 32964 days 90 days 07:29:45.205479452      32964

In [111]:
# 같은 결과 lambda 활용하기

scientists = scientists.assign(
    age_days_assign = scientists["died_dt"] - scientists["born_dt"],
    age_year_assign = lambda df_: (df_["age_days_assign"].dt.days / 365).apply(np.floor)  
)         #날짜 수를 햇수로 변환
print(scientists)

                   Name        Born        Died  Age          Occupation  \
0     Rosaline Franklin  1920-07-25  1958-04-16   61             Chemist   
1        William Gosset  1876-06-13  1937-10-16   45        Statistician   
2  Florence Nightingale  1820-05-12  1910-08-13   37               Nurse   
3           Marie Curie  1867-11-07  1934-07-04   77             Chemist   
4         Rachel Carson  1907-05-27  1964-04-14   90           Biologist   
5             John Snow  1813-03-15  1858-06-16   56           Physician   
6           Alan Turing  1912-06-23  1954-06-07   66  Computer Scientist   
7          Johann Gauss  1777-04-30  1855-02-23   41       Mathematician   

     born_dt    died_dt   age_days                  age_years age_days_assign  \
0 1920-07-25 1958-04-16 13779 days 37 days 18:00:59.178082191      13779 days   
1 1876-06-13 1937-10-16 22404 days 61 days 09:08:23.013698630      22404 days   
2 1820-05-12 1910-08-13 32964 days 90 days 07:29:45.205479452      32964

---
열 삭제하기

In [112]:
print(scientists.columns)

Index(['Name', 'Born', 'Died', 'Age', 'Occupation', 'born_dt', 'died_dt',
       'age_days', 'age_years', 'age_days_assign', 'age_year_assign'],
      dtype='object')


In [113]:
scientists_dropped = scientists.drop(["Age"], axis = "columns")

In [114]:
print(scientists_dropped.columns)

Index(['Name', 'Born', 'Died', 'Occupation', 'born_dt', 'died_dt', 'age_days',
       'age_years', 'age_days_assign', 'age_year_assign'],
      dtype='object')


---
데이터 저장하고 불러오기

In [115]:
names = scientists["Name"]
print(names)

0       Rosaline Franklin
1          William Gosset
2    Florence Nightingale
3             Marie Curie
4           Rachel Carson
5               John Snow
6             Alan Turing
7            Johann Gauss
Name: Name, dtype: object


In [144]:
scientists.to_csv( "./scientists_df_no_index.csv", index = False)
# 반드시 index = F 로 설정할 것 !! 아니면 중복으로 들어감

---
엑셀로 저장하기

In [150]:
import openpyxl

In [153]:
names = scientists["Name"] #이름 열만 추출
names

0       Rosaline Franklin
1          William Gosset
2    Florence Nightingale
3             Marie Curie
4           Rachel Carson
5               John Snow
6             Alan Turing
7            Johann Gauss
Name: Name, dtype: object

In [154]:
names_df = names.to_frame()  #시리즈를 데이터프레임으로 변환

In [156]:
names_df.to_excel("./scientists_names_series_df.xls", engine = "openpyxl") #엑셀 파일로 저장

In [157]:
scientists.to_excel("./scientists_df.xlsx",
                   sheet_name = "scientists",
                   index = False)

---
딕셔너리로 변환하기

In [159]:
sci_sub_dict = scientists.head(2)
sci_sub_dict

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt,age_days,age_years,age_days_assign,age_year_assign
0,Rosaline Franklin,1920-07-25,1958-04-16,61,Chemist,1920-07-25,1958-04-16,13779 days,37 days 18:00:59.178082191,13779 days,37.0
1,William Gosset,1876-06-13,1937-10-16,45,Statistician,1876-06-13,1937-10-16,22404 days,61 days 09:08:23.013698630,22404 days,61.0


In [161]:
sci_dict = sci_sub_dict.to_dict() # 딕셔너리로 변환
sci_dict

{'Name': {0: 'Rosaline Franklin', 1: 'William Gosset'},
 'Born': {0: '1920-07-25', 1: '1876-06-13'},
 'Died': {0: '1958-04-16', 1: '1937-10-16'},
 'Age': {0: 61, 1: 45},
 'Occupation': {0: 'Chemist', 1: 'Statistician'},
 'born_dt': {0: Timestamp('1920-07-25 00:00:00'),
  1: Timestamp('1876-06-13 00:00:00')},
 'died_dt': {0: Timestamp('1958-04-16 00:00:00'),
  1: Timestamp('1937-10-16 00:00:00')},
 'age_days': {0: Timedelta('13779 days 00:00:00'),
  1: Timedelta('22404 days 00:00:00')},
 'age_years': {0: Timedelta('37 days 18:00:59.178082191'),
  1: Timedelta('61 days 09:08:23.013698630')},
 'age_days_assign': {0: Timedelta('13779 days 00:00:00'),
  1: Timedelta('22404 days 00:00:00')},
 'age_year_assign': {0: 37.0, 1: 61.0}}

In [164]:
import pprint   #데이터를 보기 좋은 형식으로 출력해줌
pprint.pprint(sci_dict)  

{'Age': {0: 61, 1: 45},
 'Born': {0: '1920-07-25', 1: '1876-06-13'},
 'Died': {0: '1958-04-16', 1: '1937-10-16'},
 'Name': {0: 'Rosaline Franklin', 1: 'William Gosset'},
 'Occupation': {0: 'Chemist', 1: 'Statistician'},
 'age_days': {0: Timedelta('13779 days 00:00:00'),
              1: Timedelta('22404 days 00:00:00')},
 'age_days_assign': {0: Timedelta('13779 days 00:00:00'),
                     1: Timedelta('22404 days 00:00:00')},
 'age_year_assign': {0: 37.0, 1: 61.0},
 'age_years': {0: Timedelta('37 days 18:00:59.178082191'),
               1: Timedelta('61 days 09:08:23.013698630')},
 'born_dt': {0: Timestamp('1920-07-25 00:00:00'),
             1: Timestamp('1876-06-13 00:00:00')},
 'died_dt': {0: Timestamp('1958-04-16 00:00:00'),
             1: Timestamp('1937-10-16 00:00:00')}}


In [167]:
sci_dict_df = pd.DataFrame.from_dict(sci_dict)  #딕셔너리는 다시 데이터 프레임으로 변환할 수 있음
sci_dict_df

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt,age_days,age_years,age_days_assign,age_year_assign
0,Rosaline Franklin,1920-07-25,1958-04-16,61,Chemist,1920-07-25,1958-04-16,13779 days,37 days 18:00:59.178082191,13779 days,37.0
1,William Gosset,1876-06-13,1937-10-16,45,Statistician,1876-06-13,1937-10-16,22404 days,61 days 09:08:23.013698630,22404 days,61.0


---
JSON으로 변환하기

In [168]:
sci_json = sci_sub_dict.to_json(orient = "records", indent=2, date_format = "iso")
pprint.pprint(sci_json)

('[\n'
 '  {\n'
 '    "Name":"Rosaline Franklin",\n'
 '    "Born":"1920-07-25",\n'
 '    "Died":"1958-04-16",\n'
 '    "Age":61,\n'
 '    "Occupation":"Chemist",\n'
 '    "born_dt":"1920-07-25T00:00:00.000",\n'
 '    "died_dt":"1958-04-16T00:00:00.000",\n'
 '    "age_days":"P13779DT0H0M0S",\n'
 '    "age_years":"P37DT18H0M59.178082191S",\n'
 '    "age_days_assign":"P13779DT0H0M0S",\n'
 '    "age_year_assign":37.0\n'
 '  },\n'
 '  {\n'
 '    "Name":"William Gosset",\n'
 '    "Born":"1876-06-13",\n'
 '    "Died":"1937-10-16",\n'
 '    "Age":45,\n'
 '    "Occupation":"Statistician",\n'
 '    "born_dt":"1876-06-13T00:00:00.000",\n'
 '    "died_dt":"1937-10-16T00:00:00.000",\n'
 '    "age_days":"P22404DT0H0M0S",\n'
 '    "age_years":"P61DT9H8M23.013698630S",\n'
 '    "age_days_assign":"P22404DT0H0M0S",\n'
 '    "age_year_assign":61.0\n'
 '  }\n'
 ']')


In [172]:
sci_json_df = pd.read_json(
    ('[\n'
 '  {\n'
 '    "Name":"Rosaline Franklin",\n'
 '    "Born":"1920-07-25",\n'
 '    "Died":"1958-04-16",\n'
 '    "Age":61,\n'
 '    "Occupation":"Chemist",\n'
 '    "born_dt":"1920-07-25T00:00:00.000",\n'
 '    "died_dt":"1958-04-16T00:00:00.000",\n'
 '    "age_days":"P13779DT0H0M0S",\n'
 '    "age_years":"P37DT18H0M59.178082191S",\n'
 '    "age_days_assign":"P13779DT0H0M0S",\n'
 '    "age_year_assign":37.0\n'
 '  },\n'
 '  {\n'
 '    "Name":"William Gosset",\n'
 '    "Born":"1876-06-13",\n'
 '    "Died":"1937-10-16",\n'
 '    "Age":45,\n'
 '    "Occupation":"Statistician",\n'
 '    "born_dt":"1876-06-13T00:00:00.000",\n'
 '    "died_dt":"1937-10-16T00:00:00.000",\n'
 '    "age_days":"P22404DT0H0M0S",\n'
 '    "age_years":"P61DT9H8M23.013698630S",\n'
 '    "age_days_assign":"P22404DT0H0M0S",\n'
 '    "age_year_assign":61.0\n'
 '  }\n'
 ']'), 
    orient = "records")

print(sci_json_df)  

                Name        Born        Died  Age    Occupation  \
0  Rosaline Franklin  1920-07-25  1958-04-16   61       Chemist   
1     William Gosset  1876-06-13  1937-10-16   45  Statistician   

                   born_dt                  died_dt        age_days  \
0  1920-07-25T00:00:00.000  1958-04-16T00:00:00.000  P13779DT0H0M0S   
1  1876-06-13T00:00:00.000  1937-10-16T00:00:00.000  P22404DT0H0M0S   

                 age_years age_days_assign  age_year_assign  
0  P37DT18H0M59.178082191S  P13779DT0H0M0S               37  
1   P61DT9H8M23.013698630S  P22404DT0H0M0S               61  


  sci_json_df = pd.read_json(


In [174]:
sci_json_df.dtypes #날짜와 시간형식이 원본과 달라짐

Name               object
Born               object
Died               object
Age                 int64
Occupation         object
born_dt            object
died_dt            object
age_days           object
age_years          object
age_days_assign    object
age_year_assign     int64
dtype: object

In [176]:
# 날짜 데이터 타입을 다시 datatime 객체로 변환하기
sci_json_df["died_dt_json"] = pd.to_datetime(sci_json_df["died_dt"])
print(sci_json_df)

                Name        Born        Died  Age    Occupation  \
0  Rosaline Franklin  1920-07-25  1958-04-16   61       Chemist   
1     William Gosset  1876-06-13  1937-10-16   45  Statistician   

                   born_dt                  died_dt        age_days  \
0  1920-07-25T00:00:00.000  1958-04-16T00:00:00.000  P13779DT0H0M0S   
1  1876-06-13T00:00:00.000  1937-10-16T00:00:00.000  P22404DT0H0M0S   

                 age_years age_days_assign  age_year_assign died_dt_json  
0  P37DT18H0M59.178082191S  P13779DT0H0M0S               37   1958-04-16  
1   P61DT9H8M23.013698630S  P22404DT0H0M0S               61   1937-10-16  


In [177]:
sci_json_df.dtypes

Name                       object
Born                       object
Died                       object
Age                         int64
Occupation                 object
born_dt                    object
died_dt                    object
age_days                   object
age_years                  object
age_days_assign            object
age_year_assign             int64
died_dt_json       datetime64[ns]
dtype: object