## DataFrame: anatomy, important attributes & methods, and data types
### What is a DataFrame?
- A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric,string, boolean, etc.).
- The DataFrame has both a row and column index.
- Compared with other such DataFrame-like structures you may have used before (like R’s data.frame), row-oriented and column-oriented operations in DataFrame are treated roughly symmetrically.   

### Anatomy of a DataFrame
<img align="left" src="https://miro.medium.com/max/2000/1*aJJjpCHMNVyfM3UIJZyYcw.png">

In [2]:
import pandas as pd
from pandas import Series, DataFrame

In [6]:
#Sample Data 생성
data = {'state': ['서울', '서울', '서울', '부산', '부산', '부산'],
        'year': [2014, 2016, 2018, 2014, 2016, 2018],
        'pop': [997.5, 984.3, 970.5, 345.2, 344.7, 340.0]}
data

{'state': ['서울', '서울', '서울', '부산', '부산', '부산'],
 'year': [2014, 2016, 2018, 2014, 2016, 2018],
 'pop': [997.5, 984.3, 970.5, 345.2, 344.7, 340.0]}

In [7]:
#DataFrame 으로 변경
df = DataFrame(data)
df

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


### Important Attributes and Methods
| Attributes or Methods | Results |
|:---|:---|
| df.index | Array-like row labels |
| df.columns | Array-like column labels |
| df.values | Numpy array, data |
| df.shape | (n_rows, m_cols) |
| df.dtypes | Type of each column |
| len(df) | Number of rows |
| df.head(), df.tail() | First/last rows |
| df.describe() | Summary stats |
| df.info() | Summary of a DF |

In [9]:
#상위 5개 출력
df.head()

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7


In [11]:
#하위 5개 출력
df.tail()

Unnamed: 0,state,year,pop
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


In [12]:
#n_rows
len(df)

6

In [13]:
#n_rows * n_cols
df.size

18

### Data types
<img align="left" src="http://drive.google.com/uc?export=view&id=1x5GP26v8C5oUDB3RY2mADHLRf9K1WgDe">

In [14]:
#열 별 데이터타입 확인
df.dtypes

state     object
year       int64
pop      float64
dtype: object

In [15]:
df.dtypes.value_counts()

int64      1
float64    1
object     1
dtype: int64

In [20]:
#df에 대한 정보
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   state   6 non-null      object 
 1   year    6 non-null      int64  
 2   pop     6 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 272.0+ bytes


In [21]:
#df에 대한 통계량
df.describe()

Unnamed: 0,year,pop
count,6.0,6.0
mean,2016.0,663.7
std,1.788854,351.089157
min,2014.0,340.0
25%,2014.5,344.825
50%,2016.0,657.85
75%,2017.5,980.85
max,2018.0,997.5


In [23]:
#pandas 데이터의 타입을 변경하려면 astype 메소드 사용
df.describe().astype('int')

Unnamed: 0,year,pop
count,6,6
mean,2016,663
std,1,351
min,2014,340
25%,2014,344
50%,2016,657
75%,2017,980
max,2018,997


## Series: a single column of data from a DataFrame

In [34]:
#Series
df.state

0    서울
1    서울
2    서울
3    부산
4    부산
5    부산
Name: state, dtype: object

In [30]:
type(df.state)

pandas.core.series.Series

In [31]:
df.state.value_counts()

서울    3
부산    3
Name: state, dtype: int64

In [39]:
df.state.shape

(6,)

In [37]:
#유일값
df.state.unique()

array(['서울', '부산'], dtype=object)

In [38]:
#유일값 개수
df.state.nunique()

2

In [44]:
df['pop']

0    997.5
1    984.3
2    970.5
3    345.2
4    344.7
5    340.0
Name: pop, dtype: float64

In [51]:
#add, sub, mul, div, floordiv, mod(나머지), pow(제곱) 등의 연산자 적용 가능
df['pop'].add(1).sub(2).mul(3).div(6)

0    498.25
1    491.65
2    484.75
3    172.10
4    171.85
5    169.50
Name: pop, dtype: float64

<font color='blue'><p>
==> The sequential invocation of methods using the dot notation is referred to as **method chaining**.

In [52]:
#lt <, gt >, le <=, ge >=, eq ==, ne != 비교연산자 적용 가능
df['pop'].gt(500)

0     True
1     True
2     True
3    False
4    False
5    False
Name: pop, dtype: bool

In [53]:
#비율 구하기
df['pop'] / df['pop'].sum()

0    0.250490
1    0.247175
2    0.243710
3    0.086686
4    0.086560
5    0.085380
Name: pop, dtype: float64

## Summarizing Data
<img align="left" src="http://drive.google.com/uc?export=view&id=16CeMSPNyCjyICq_KlJm7ERWNoJfKwgZo">
<img align="left" src="http://drive.google.com/uc?export=view&id=1eCA7rRFojxc7WIG7xrVasuCbWuoY2KGT">

In [55]:
df

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


In [57]:
#최소값(default는 axis=0으로, 열별 최소값을 출력함)
df.min()

state       부산
year      2014
pop      340.0
dtype: object

In [59]:
#행별 최소값을 출력하려면 axis=1 지정
df.min(axis=1)

0    997.5
1    984.3
2    970.5
3    345.2
4    344.7
5    340.0
dtype: float64

In [64]:
#합(문자형도 가능)
df.sum()

state    서울서울서울부산부산부산
year            12096
pop            3982.2
dtype: object

In [62]:
df.count()

state    6
year     6
pop      6
dtype: int64

In [65]:
#분위수 값 구하기 (문자형은 불가능)
df.quantile(q=0.5)

year    2016.00
pop      657.85
Name: 0.5, dtype: float64

In [75]:
'''
DataFrame 변수를 assign할 때 (즉, df2 = df1),
주소(address)가 복사되고 내용은 복사되지 않음.
따라서, 원 데이터는 그대로 보존하면서
DataFrame 변수의 내용을 다른 변수에 복사할 경우 copy 메소드를 사용할 것 !
'''
dx = df.copy()
display(dx)
print('↑dx(df를 copy)')

dy = dx

dx.iloc[1,1] = pd.NA  # use pd.NA or NumPy NaN (np.nan) to represent a missing value.

display(dx)
print('↑NA값 추가한 dx')

display(dy)
print('↑dy(dy=dx) : dx와 같음.')

display(df)
print('↑df : dx의 내용은 변경되었으나 원 데이터는 보존.')

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


↑dx(df를 copy)


Unnamed: 0,state,year,pop
0,서울,2014.0,997.5
1,서울,,984.3
2,서울,2018.0,970.5
3,부산,2014.0,345.2
4,부산,2016.0,344.7
5,부산,2018.0,340.0


↑NA값 추가한 dx


Unnamed: 0,state,year,pop
0,서울,2014.0,997.5
1,서울,,984.3
2,서울,2018.0,970.5
3,부산,2014.0,345.2
4,부산,2016.0,344.7
5,부산,2018.0,340.0


↑dy(dy=dx) : dx와 같음.


Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


↑df : dx의 내용은 변경되었으나 원 데이터는 보존.


## Indexing and Slicing
<img src='http://drive.google.com/uc?export=view&id=1eyYiZP5xQr5M6IIw9lnKjoAYY1m4YmiI'>

In [76]:
df

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


- <font color='blue'>some rows (all columns in a DataFrame)

In [85]:
#'loc'은 lable, 즉 index_name 기준. 즉 index가 3인 것까지 DataFrame 추출
df.loc[:3]

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2


In [87]:
#'iloc'은 index 기준. <3을 뜻하므로 index가 0~2인 DataFrame이 추출
#원하는 개수만큼의 열을 출력할 때 iloc을 사용하면 편리함.
df.iloc[:3]

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5


- <font color='blue'>All rows, some columns

In [88]:
#':' 사용시 모든 행/열이 추출
df.iloc[:, :2]

Unnamed: 0,state,year
0,서울,2014
1,서울,2016
2,서울,2018
3,부산,2014
4,부산,2016
5,부산,2018


In [89]:
df.iloc[:2, :]

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3


In [94]:
df.loc[:, 'pop']

0    997.5
1    984.3
2    970.5
3    345.2
4    344.7
5    340.0
Name: pop, dtype: float64

In [95]:
df.iloc[:,:]

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


## Boolean Indexing

In [97]:
df

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5
3,부산,2014,345.2
4,부산,2016,344.7
5,부산,2018,340.0


- 인구수가 500보다 큰 열 추출하기

In [103]:
#1.
df[df['pop'] > 500]

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5


In [99]:
#2. query을 사용하면 더 편리하게 추출 가능
df.query('pop > 500')

Unnamed: 0,state,year,pop
0,서울,2014,997.5
1,서울,2016,984.3
2,서울,2018,970.5


In [101]:
#query문 안에 여타 함수 사용 가능
df.query('pop > pop.mean()')[['year','pop']]

Unnamed: 0,year,pop
0,2014,997.5
1,2016,984.3
2,2018,970.5


In [102]:
#'and'를 활용하면 query문 안에 조건을 여러개 줄 수도 있음.
#query문 안에 문자열을 사용하고 싶을 경우, 밖의 따옴표와 다른 따옴표 사용할 것.
df.query('year in [2014,2016] and state == "부산"')

Unnamed: 0,state,year,pop
3,부산,2014,345.2
4,부산,2016,344.7


<font color = "blue"><p>
## Exercises

**Exercise 21**: Select the first row of data (the first record) from `movie`.

In [105]:
movie

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [107]:
#A1. head 사용
movie.head(1)

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000


In [114]:
#A2. iloc 사용
movie.iloc[0:1]

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000


In [118]:
#A3. loc 사용
movie.loc[:0]

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000


In [119]:
#A4. query 사용
movie.query('index == 0')

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000


**Exercise 22**: Select the first 10 values from the `director_name` column in `movie`.

In [122]:
movie

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
4913,Color,Benjamin Roberds,13.0,76.0,0.0,0.0,Maxwell Moody,0.0,,Drama|Horror|Thriller,...,3.0,English,USA,,1400.0,2013.0,0.0,6.3,,16
4914,Color,Daniel Hsia,14.0,100.0,0.0,489.0,Daniel Henney,946.0,10443.0,Comedy|Drama|Romance,...,9.0,English,USA,PG-13,,2012.0,719.0,6.3,2.35,660


In [132]:
#A1. head 사용
DataFrame(movie.director_name.head(10))

Unnamed: 0,director_name
0,James Cameron
1,Gore Verbinski
2,Sam Mendes
3,Christopher Nolan
4,Doug Walker
5,Andrew Stanton
6,Sam Raimi
7,Nathan Greno
8,Joss Whedon
9,David Yates


In [133]:
#A2. iloc 사용
DataFrame(movie.iloc[:10, 1])

Unnamed: 0,director_name
0,James Cameron
1,Gore Verbinski
2,Sam Mendes
3,Christopher Nolan
4,Doug Walker
5,Andrew Stanton
6,Sam Raimi
7,Nathan Greno
8,Joss Whedon
9,David Yates


In [139]:
#A3. loc 사용
DataFrame(movie.loc[:9, 'director_name'])

Unnamed: 0,director_name
0,James Cameron
1,Gore Verbinski
2,Sam Mendes
3,Christopher Nolan
4,Doug Walker
5,Andrew Stanton
6,Sam Raimi
7,Nathan Greno
8,Joss Whedon
9,David Yates


In [143]:
#A4. 행만 iloc으로 추출
DataFrame(movie.iloc[:10].director_name)

Unnamed: 0,director_name
0,James Cameron
1,Gore Verbinski
2,Sam Mendes
3,Christopher Nolan
4,Doug Walker
5,Andrew Stanton
6,Sam Raimi
7,Nathan Greno
8,Joss Whedon
9,David Yates


In [144]:
#A5. 행만 loc으로 추출
DataFrame(movie.loc[:9].director_name)

Unnamed: 0,director_name
0,James Cameron
1,Gore Verbinski
2,Sam Mendes
3,Christopher Nolan
4,Doug Walker
5,Andrew Stanton
6,Sam Raimi
7,Nathan Greno
8,Joss Whedon
9,David Yates


**Exercise 23**: Select the `country` and `budget` columns of the first 101 records.

In [148]:
#A1. head 사용
movie.head(101)[['country','budget']]

Unnamed: 0,country,budget
0,USA,237000000.0
1,USA,300000000.0
2,UK,245000000.0
3,USA,250000000.0
4,,
...,...,...
96,USA,165000000.0
97,USA,160000000.0
98,Japan,
99,USA,180000000.0


In [150]:
#A2. iloc 사용
movie.iloc[:101][['country','budget']]

Unnamed: 0,country,budget
0,USA,237000000.0
1,USA,300000000.0
2,UK,245000000.0
3,USA,250000000.0
4,,
...,...,...
96,USA,165000000.0
97,USA,160000000.0
98,Japan,
99,USA,180000000.0


In [151]:
#A3. loc 사용
movie.loc[:100, ['country','budget']]

Unnamed: 0,country,budget
0,USA,237000000.0
1,USA,300000000.0
2,UK,245000000.0
3,USA,250000000.0
4,,
...,...,...
96,USA,165000000.0
97,USA,160000000.0
98,Japan,
99,USA,180000000.0


**Exercise 24**: Select the `imdb_score` column for the last 1000 movies.

In [154]:
#A1. tail 사용
DataFrame(movie.tail(1000).imdb_score)

Unnamed: 0,imdb_score
3916,6.8
3917,3.9
3918,6.1
3919,7.5
3920,8.2
...,...
4911,7.7
4912,7.5
4913,6.3
4914,6.3


In [160]:
#A2. iloc 사용 (iloc은 마이너스 인덱스로 추출 가능)
DataFrame(movie.iloc[-1000:].imdb_score)

Unnamed: 0,imdb_score
3916,6.8
3917,3.9
3918,6.1
3919,7.5
3920,8.2
...,...
4911,7.7
4912,7.5
4913,6.3
4914,6.3


**Exercise 25**: Select movies made in `South Korea`. Hint: `movie.country` equals what?

In [168]:
#A1. query 사용
movie.query('country == "South Korea"')

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
1063,Color,Terence Young,10.0,140.0,92.0,522.0,Ben Gazzara,1000.0,,Drama|History|War,...,16.0,English,South Korea,PG,48000000.0,1981.0,623.0,2.7,2.35,115
1313,Color,Joon-ho Bong,488.0,126.0,584.0,398.0,Ewen Bremner,11000.0,4563029.0,Action|Drama|Sci-Fi|Thriller,...,514.0,English,South Korea,R,39200000.0,2013.0,557.0,7.0,1.85,58000
1549,Color,Hyung-rae Shim,93.0,107.0,26.0,535.0,Aimee Garcia,889.0,10956379.0,Action|Drama|Fantasy|Horror|Thriller,...,364.0,English,South Korea,PG-13,35000000.0,2007.0,618.0,3.6,2.35,0
2576,Color,JK Youn,43.0,103.0,2.0,13.0,Ji-won Ha,94.0,,Action|Comedy|Drama|Thriller,...,25.0,Korean,South Korea,R,,2009.0,80.0,5.7,2.35,558
2788,Color,Hyung-rae Shim,4.0,100.0,26.0,385.0,Stephanie Danielson,898.0,163591.0,Comedy,...,17.0,English,South Korea,PG-13,13400000.0,2010.0,391.0,3.6,1.78,502
2845,Color,John H. Lee,2.0,115.0,32.0,29.0,Dean Dawson,14000.0,31662.0,Action|Drama|History|War,...,1.0,English,South Korea,,12620000.0,2016.0,81.0,6.8,,139
2867,Color,Je-kyu Kang,86.0,148.0,16.0,489.0,Bin Won,717.0,1110186.0,Action|Drama|War,...,224.0,Korean,South Korea,R,12800000.0,2004.0,517.0,8.1,2.35,0
3191,Color,Taedong Park,8.0,85.0,0.0,196.0,Tom Arnold,1000.0,,Adventure|Animation,...,2.0,English,South Korea,,10000000.0,2014.0,618.0,4.8,,186
3198,Color,Jee-woon Kim,152.0,135.0,419.0,7.0,Woo-sung Jung,398.0,128486.0,Action|Adventure|Comedy|Western,...,74.0,Korean,South Korea,R,10000000.0,2008.0,149.0,7.3,2.35,0
3252,Color,Hong-jin Na,77.0,156.0,43.0,0.0,Jun Kunimura,45.0,770629.0,Fantasy|Horror|Mystery|Thriller,...,24.0,Korean,South Korea,Not Rated,,2016.0,5.0,7.7,2.35,0


In [169]:
#A2.
movie[movie.country=='South Korea']

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
1063,Color,Terence Young,10.0,140.0,92.0,522.0,Ben Gazzara,1000.0,,Drama|History|War,...,16.0,English,South Korea,PG,48000000.0,1981.0,623.0,2.7,2.35,115
1313,Color,Joon-ho Bong,488.0,126.0,584.0,398.0,Ewen Bremner,11000.0,4563029.0,Action|Drama|Sci-Fi|Thriller,...,514.0,English,South Korea,R,39200000.0,2013.0,557.0,7.0,1.85,58000
1549,Color,Hyung-rae Shim,93.0,107.0,26.0,535.0,Aimee Garcia,889.0,10956379.0,Action|Drama|Fantasy|Horror|Thriller,...,364.0,English,South Korea,PG-13,35000000.0,2007.0,618.0,3.6,2.35,0
2576,Color,JK Youn,43.0,103.0,2.0,13.0,Ji-won Ha,94.0,,Action|Comedy|Drama|Thriller,...,25.0,Korean,South Korea,R,,2009.0,80.0,5.7,2.35,558
2788,Color,Hyung-rae Shim,4.0,100.0,26.0,385.0,Stephanie Danielson,898.0,163591.0,Comedy,...,17.0,English,South Korea,PG-13,13400000.0,2010.0,391.0,3.6,1.78,502
2845,Color,John H. Lee,2.0,115.0,32.0,29.0,Dean Dawson,14000.0,31662.0,Action|Drama|History|War,...,1.0,English,South Korea,,12620000.0,2016.0,81.0,6.8,,139
2867,Color,Je-kyu Kang,86.0,148.0,16.0,489.0,Bin Won,717.0,1110186.0,Action|Drama|War,...,224.0,Korean,South Korea,R,12800000.0,2004.0,517.0,8.1,2.35,0
3191,Color,Taedong Park,8.0,85.0,0.0,196.0,Tom Arnold,1000.0,,Adventure|Animation,...,2.0,English,South Korea,,10000000.0,2014.0,618.0,4.8,,186
3198,Color,Jee-woon Kim,152.0,135.0,419.0,7.0,Woo-sung Jung,398.0,128486.0,Action|Adventure|Comedy|Western,...,74.0,Korean,South Korea,R,10000000.0,2008.0,149.0,7.3,2.35,0
3252,Color,Hong-jin Na,77.0,156.0,43.0,0.0,Jun Kunimura,45.0,770629.0,Fantasy|Horror|Mystery|Thriller,...,24.0,Korean,South Korea,Not Rated,,2016.0,5.0,7.7,2.35,0


**Exercise 26**: Select movies whose `gross` is `NaN`.

In [174]:
#A1.
movie[movie.gross.isna()]

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
84,Color,Roland Joffé,10.0,109.0,596.0,283.0,Alice Englert,622.0,,Action|Adventure|Romance|Sci-Fi,...,15.0,English,Belgium,R,,2015.0,525.0,4.5,,677
98,Color,Hideaki Anno,1.0,120.0,28.0,12.0,Shin'ya Tsukamoto,544.0,,Action|Adventure|Drama|Horror|Sci-Fi,...,13.0,Japanese,Japan,,,2016.0,106.0,8.2,2.35,0
176,Color,,21.0,60.0,,184.0,Philip Michael Thomas,982.0,,Action|Crime|Drama|Mystery|Thriller,...,74.0,English,USA,TV-14,1500000.0,,321.0,7.5,1.33,0
197,Color,Matt Birch,1.0,,0.0,159.0,Dave Legeno,10000.0,,Action|Fantasy,...,2.0,English,UK,,,2011.0,570.0,7.5,,40
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4905,Color,Ash Baron-Cohen,10.0,98.0,3.0,152.0,Stanley B. Herman,789.0,,Crime|Drama,...,14.0,English,USA,,,1995.0,194.0,6.4,,20
4909,Color,Anthony Vallone,,84.0,2.0,2.0,John Considine,45.0,,Crime|Drama,...,1.0,English,USA,PG-13,3250.0,2005.0,44.0,7.8,,4
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000


In [181]:
#A2. query문에서 != 사용하면 추출가능
movie.query('gross != gross')

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
84,Color,Roland Joffé,10.0,109.0,596.0,283.0,Alice Englert,622.0,,Action|Adventure|Romance|Sci-Fi,...,15.0,English,Belgium,R,,2015.0,525.0,4.5,,677
98,Color,Hideaki Anno,1.0,120.0,28.0,12.0,Shin'ya Tsukamoto,544.0,,Action|Adventure|Drama|Horror|Sci-Fi,...,13.0,Japanese,Japan,,,2016.0,106.0,8.2,2.35,0
176,Color,,21.0,60.0,,184.0,Philip Michael Thomas,982.0,,Action|Crime|Drama|Mystery|Thriller,...,74.0,English,USA,TV-14,1500000.0,,321.0,7.5,1.33,0
197,Color,Matt Birch,1.0,,0.0,159.0,Dave Legeno,10000.0,,Action|Fantasy,...,2.0,English,UK,,,2011.0,570.0,7.5,,40
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4905,Color,Ash Baron-Cohen,10.0,98.0,3.0,152.0,Stanley B. Herman,789.0,,Crime|Drama,...,14.0,English,USA,,,1995.0,194.0,6.4,,20
4909,Color,Anthony Vallone,,84.0,2.0,2.0,John Considine,45.0,,Crime|Drama,...,1.0,English,USA,PG-13,3250.0,2005.0,44.0,7.8,,4
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000


In [183]:
#A3. loc 사용
movie.loc[movie.gross.isna(), :]

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
84,Color,Roland Joffé,10.0,109.0,596.0,283.0,Alice Englert,622.0,,Action|Adventure|Romance|Sci-Fi,...,15.0,English,Belgium,R,,2015.0,525.0,4.5,,677
98,Color,Hideaki Anno,1.0,120.0,28.0,12.0,Shin'ya Tsukamoto,544.0,,Action|Adventure|Drama|Horror|Sci-Fi,...,13.0,Japanese,Japan,,,2016.0,106.0,8.2,2.35,0
176,Color,,21.0,60.0,,184.0,Philip Michael Thomas,982.0,,Action|Crime|Drama|Mystery|Thriller,...,74.0,English,USA,TV-14,1500000.0,,321.0,7.5,1.33,0
197,Color,Matt Birch,1.0,,0.0,159.0,Dave Legeno,10000.0,,Action|Fantasy,...,2.0,English,UK,,,2011.0,570.0,7.5,,40
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4905,Color,Ash Baron-Cohen,10.0,98.0,3.0,152.0,Stanley B. Herman,789.0,,Crime|Drama,...,14.0,English,USA,,,1995.0,194.0,6.4,,20
4909,Color,Anthony Vallone,,84.0,2.0,2.0,John Considine,45.0,,Crime|Drama,...,1.0,English,USA,PG-13,3250.0,2005.0,44.0,7.8,,4
4911,Color,Scott Smith,1.0,87.0,2.0,318.0,Daphne Zuniga,637.0,,Comedy|Drama,...,6.0,English,Canada,,,2013.0,470.0,7.7,,84
4912,Color,,43.0,43.0,,319.0,Valorie Curry,841.0,,Crime|Drama|Mystery|Thriller,...,359.0,English,USA,TV-14,,,593.0,7.5,16.00,32000
