# # How to extract specific values from dataframe(like JSON) with Python? 
# >> ast.literal_eval
: string을 자료형으로 구조화 시키는 용도  
- [ast — Abstract Syntax Trees](https://docs.python.org/3/library/ast.html)  
- [python eval 과 literal_eval 의 차이](https://bluese05.tistory.com/65)   
- [python eval() 함수 - 사용을 조심해야 하는 이유](https://bluese05.tistory.com/64?category=559959)

### ■ Example

In [None]:
import ast

ex = "{'a': 1, 'b': 2, 'c': 3}"
print(type(ex))
print(ex)

ex = ast.literal_eval(ex)
print(type(ex)) 
print(ex)

In [None]:
ex = """(1, 2, {'foo': 'bar'})"""
print(type(ex))
print(ex)

ex = ast.literal_eval(ex)
print(type(ex)) 
print(ex)

In [None]:
ex = '''['a','b','c','d','e']'''
print(type(ex))
print(ex)

ex = ast.literal_eval(str(ex))
print(type(ex))
print(ex)

In [None]:
# ex = 'a,b,c,d,e,'
# print(type(ex))
# print(ex)

# ex = ast.literal_eval(ex) # ValueError: malformed node or string: # 구조화 할 수 있는 형태로 정의가 되어있어야만 타입변환 가능
# print(type(ex))
# print(ex) 

### ■ With Real Data

In [None]:
import pandas as pd

movies_df = pd.read_csv('../input/the-movies-dataset/movies_metadata.csv')
movies_df.head()

**1) 특정 키의 값만 가져오기**

In [None]:
movies_df.genres[0]

In [None]:
genres1 = movies_df.genres.apply(ast.literal_eval).apply(lambda x : [i['name'] for i in x] if isinstance(x, list) else [])
genres1

In [None]:
genres2 = movies_df.genres.apply(ast.literal_eval).apply(lambda x : [i['name'] for i in x])
genres2

In [None]:
genres1.equals(genres2)

In [None]:
movies_df['genres'] = genres1
movies_df['genres']

**2) 특정 값을 가진 데이터의 다른 특정 키의 값만 가져오기**

In [None]:
credits_df = pd.read_csv('../input/the-movies-dataset/credits.csv')
credits_df.head()

In [None]:
credits_df.crew[0]

In [None]:
import numpy as np

def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan

In [None]:
# credits_df['crew'].apply(get_director) # TypeError: string indices must be integers

In [None]:
credits_df['crew'].apply(ast.literal_eval).apply(get_director)

# # How to split cell into multiple rows in Dataframe?
# >> Series.stack().reset_index(level=1, drop=True)
- [pandas.DataFrame.stack](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html)   
- [pandas.DataFrame.reset_index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html)  
- [Pandas: 한 셀의 데이터를 여러 행으로 나누기](https://ohgyun.com/768)

### ■ Example

In [None]:
df = pd.DataFrame({'column1':['a,b,c,d,e','d,e,f','h,i']})
df

In [None]:
# 각 행의 데이터 나누기
df.column1.str.split(',')

In [None]:
# 나누어진 데이터 series로 반환
divided_df = df.column1.str.split(',').apply(lambda x : pd.Series(x))
divided_df

In [None]:
# Return a reshaped DataFrame or Series having a multi-level index
divided_df.stack()

In [None]:
divided_df = divided_df.stack().reset_index(level=1, drop=True).to_frame('column2')
# level = Only remove the given levels from the index
# use the drop parameter to avoid the old index being added as a column

divided_df

In [None]:
# 기존 데이터프레임과 join

print(df)
df.merge(divided_df, left_index=True, right_index=True, how='left')

### ■ With Real Data

In [None]:
movies_df['genres']

In [None]:
# convert to Series 방법 1
movies_df.genres.apply(lambda x : pd.Series(x))

In [None]:
# convert to Series 방법 2
movies_df.apply(lambda x: pd.Series(x['genres']),axis=1)

In [None]:
movies_df.genres.apply(lambda x : pd.Series(x)).stack()

In [None]:
s = movies_df.genres.apply(lambda x : pd.Series(x)).stack().reset_index(level=1, drop=True)
s.name = 'genre'
s

In [None]:
movies_df = movies_df.drop('genres', axis=1).join(s)
movies_df.genre