# 2. 파이썬으로 데이터 주무르기, pandas
**pandas를 활용해서 데이터프레임을 다뤄봅시다.**

1. Pandas 시작하기
    - prerequisite : Table
    - pandas import하기
   
2. Pandas로 1차원 데이터 다루기 - Series 
    - Series 선언하기
    - Series vs ndarray
    - Series vs dict
    - Series에 이름 붙이기
3. Pandas로 2차원 데이터 다루기 - dataframe
    - dataframe 선언하기
    - from csv to dataframe
    - dataframe 자료 접근하기

[수업에 사용된 covid 데이터](https://www.kaggle.com/imdevskp/corona-virus-report)

## I. pandas 시작하기

###  Prerequisite: Table
- row and column을 이용해서 데이터를 저장하고 관리하는 자료구조(a kind of container)
- row -> object, column -> attribute

In [24]:
import pandas as pd
import numpy as np

## II. pandas로 1차원 데이터 다루기 - Series

### Series?
- 1D labeled **array**
- 인덱스를 지정할 수 있음
- 숫자만 담을 수 있다.

In [307]:
# 리스트
s = pd.Series([1, 4, 5, 16, 26])
print(s); print()
print(s[0]); print()

# 조건문을 list comprehension으로 사용 가능
print("조건문\n", s[s > s.median()]); print()

# 해당 인덱스 순서대로 출력
print(s[[3, 1, 4]]); print()

# pandas series에 numpy 함수 사용 가능
print(np.exp(s)); print()

# datatype 확인(np.array와 동일)
print(s.dtype)
print(s.name)

0     1
1     4
2     5
3    16
4    26
dtype: int64

1

조건문
 3    16
4    26
dtype: int64

3    16
1     4
4    26
dtype: int64

0    2.718282e+00
1    5.459815e+01
2    1.484132e+02
3    8.886111e+06
4    1.957296e+11
dtype: float64

int64
None


In [38]:
# 딕셔너리
t = pd.Series({'one': 1, 'two': 2, "three": 3, "four": 4})
print(t); print()
print(t[1:3]); print()
print(t['one']); print()

# Series에 값 추가
t['zero'] = 0
print(t); print()  # 밑에 추가됨

# in 활용
print('six' in t)
print('zero' in t);
print(t.get('six', "none"))
print(t.get('zero', "none"))

one      1
two      2
three    3
four     4
dtype: int64

two      2
three    3
dtype: int64

1

one      1
two      2
three    3
four     4
zero     0
dtype: int64

False
True
none
0


In [9]:
# 추가내용
s1 = s = pd.Series([[1,1],[2, 4], [3, 9], [4, 16], [5,26]])

s1

0     [1, 1]
1     [2, 4]
2     [3, 9]
3    [4, 16]
4    [5, 26]
dtype: object

### Series에 이름 붙이기

- `name` 속성을 가지고 있다.
- 처음 Series를 만들 때 이름을 붙일 수 있다.

In [43]:
s = pd.Series(np.random.randn(5), name="random_nums")

print(s); print()
print(s.name)

0   -0.248666
1   -0.733661
2    1.250351
3   -1.461810
4    1.028626
Name: random_nums, dtype: float64

random_nums


## III. Pandas로 2차원 데이터 다루기 - dataframe

### dataframe?
- 2-D labeled __table__
- index를 지정할 수 있다.
- 내부에 숫자 말고 다양한 자료형을 담을 수 있다.

In [66]:
# dictionary
d = {"height": [1, 2, 3, 4],
     "weight": [30, 40, 50, 60]}

df = pd.DataFrame(d)
print(d)
print(df)
df       # print랑은 보기에 다르니 출력을 애용하자

{'height': [1, 2, 3, 4], 'weight': [30, 40, 50, 60]}
   height  weight
0       1      30
1       2      40
2       3      50
3       4      60


Unnamed: 0,height,weight
0,1,30
1,2,40
2,3,50
3,4,60


In [68]:
## dtype 확인
df.dtypes

height    int64
weight    int64
dtype: object

### Comma Seperated Value(csv)
- csv -> dataframe
- .read_csv()를 이용

In [77]:
# 동일 경로 country_wise_latest_csv
covid = pd.read_csv("./country_wise_latest.csv")

covid

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
0,Afghanistan,36263,1269,25198,9796,106,10,18,3.50,69.49,5.04,35526,737,2.07,Eastern Mediterranean
1,Albania,4880,144,2745,1991,117,6,63,2.95,56.25,5.25,4171,709,17.00,Europe
2,Algeria,27973,1163,18837,7973,616,8,749,4.16,67.34,6.17,23691,4282,18.07,Africa
3,Andorra,907,52,803,52,10,0,0,5.73,88.53,6.48,884,23,2.60,Europe
4,Angola,950,41,242,667,18,1,0,4.32,25.47,16.94,749,201,26.84,Africa
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
182,West Bank and Gaza,10621,78,3752,6791,152,2,0,0.73,35.33,2.08,8916,1705,19.12,Eastern Mediterranean
183,Western Sahara,10,1,8,1,0,0,0,10.00,80.00,12.50,10,0,0.00,Africa
184,Yemen,1691,483,833,375,10,4,36,28.56,49.26,57.98,1619,72,4.45,Eastern Mediterranean
185,Zambia,4552,140,2815,1597,71,1,465,3.08,61.84,4.97,3326,1226,36.86,Africa


### Pandas 활용 1. 일부분만 관찰하기

In [86]:
# 위에서부터 5개를 관찰하는 방법(함수)
covid.head(5)

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
0,Afghanistan,36263,1269,25198,9796,106,10,18,3.5,69.49,5.04,35526,737,2.07,Eastern Mediterranean
1,Albania,4880,144,2745,1991,117,6,63,2.95,56.25,5.25,4171,709,17.0,Europe
2,Algeria,27973,1163,18837,7973,616,8,749,4.16,67.34,6.17,23691,4282,18.07,Africa
3,Andorra,907,52,803,52,10,0,0,5.73,88.53,6.48,884,23,2.6,Europe
4,Angola,950,41,242,667,18,1,0,4.32,25.47,16.94,749,201,26.84,Africa


In [89]:
# 아래에서부터 5개
covid.tail(5)

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
182,West Bank and Gaza,10621,78,3752,6791,152,2,0,0.73,35.33,2.08,8916,1705,19.12,Eastern Mediterranean
183,Western Sahara,10,1,8,1,0,0,0,10.0,80.0,12.5,10,0,0.0,Africa
184,Yemen,1691,483,833,375,10,4,36,28.56,49.26,57.98,1619,72,4.45,Eastern Mediterranean
185,Zambia,4552,140,2815,1597,71,1,465,3.08,61.84,4.97,3326,1226,36.86,Africa
186,Zimbabwe,2704,36,542,2126,192,2,24,1.33,20.04,6.64,1713,991,57.85,Africa


### Pandas 활용 2. 데이터 접근하기
- df['column_name'] or df.column_name

In [106]:
# key
covid["New cases"]

0      106
1      117
2      616
3       10
4       18
      ... 
182    152
183      0
184     10
185     71
186    192
Name: New cases, Length: 187, dtype: int64

In [104]:
# attribute
covid.Active

0      9796
1      1991
2      7973
3        52
4       667
       ... 
182    6791
183       1
184     375
185    1597
186    2126
Name: Active, Length: 187, dtype: int64

### Honey Tip!
- DataFrame의 각 column은 "Series"다!

In [113]:
print(covid['Confirmed'].dtype)
print(type(covid['Confirmed']))

int64
<class 'pandas.core.series.Series'>


In [117]:
print(covid['Confirmed'][0])
print(covid['Confirmed'][1:5])

36263
1     4880
2    27973
3      907
4      950
Name: Confirmed, dtype: int64


### Pands 활용 3. "조건"을 이용해서 데이터 접근하기


In [124]:
# 신규 확진자가 100명이 넘는 나라를 찾아보자!

covid[covid["New cases"] >= 100]
covid[covid["New cases"] >= 100].head(5)

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
0,Afghanistan,36263,1269,25198,9796,106,10,18,3.5,69.49,5.04,35526,737,2.07,Eastern Mediterranean
1,Albania,4880,144,2745,1991,117,6,63,2.95,56.25,5.25,4171,709,17.0,Europe
2,Algeria,27973,1163,18837,7973,616,8,749,4.16,67.34,6.17,23691,4282,18.07,Africa
6,Argentina,167416,3059,72575,91782,4890,120,2057,1.83,43.35,4.21,130774,36642,28.02,Americas
8,Australia,15303,167,9311,5825,368,6,137,1.09,60.84,1.79,12428,2875,23.13,Western Pacific


In [135]:
# WHO 지역(WHO Region)이 동남아시아인 나라 찾기
print(set(covid["WHO Region"].values))
print(covid["WHO Region"].unique())
print(covid["WHO Region"] == "South-East Asia")
covid[covid["WHO Region"] == "South-East Asia"]

{'Europe', 'South-East Asia', 'Africa', 'Western Pacific', 'Americas', 'Eastern Mediterranean'}
['Eastern Mediterranean' 'Europe' 'Africa' 'Americas' 'Western Pacific'
 'South-East Asia']
0      False
1      False
2      False
3      False
4      False
       ...  
182    False
183    False
184    False
185    False
186    False
Name: WHO Region, Length: 187, dtype: bool


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
13,Bangladesh,226225,2965,125683,97577,2772,37,1801,1.31,55.56,2.36,207453,18772,9.05,South-East Asia
19,Bhutan,99,0,86,13,4,0,1,0.0,86.87,0.0,90,9,10.0,South-East Asia
27,Burma,350,6,292,52,0,0,2,1.71,83.43,2.05,341,9,2.64,South-East Asia
79,India,1480073,33408,951166,495499,44457,637,33598,2.26,64.26,3.51,1155338,324735,28.11,South-East Asia
80,Indonesia,100303,4838,58173,37292,1525,57,1518,4.82,58.0,8.32,88214,12089,13.7,South-East Asia
106,Maldives,3369,15,2547,807,67,0,19,0.45,75.6,0.59,2999,370,12.34,South-East Asia
119,Nepal,18752,48,13754,4950,139,3,626,0.26,73.35,0.35,17844,908,5.09,South-East Asia
158,Sri Lanka,2805,11,2121,673,23,0,15,0.39,75.61,0.52,2730,75,2.75,South-East Asia
167,Thailand,3297,58,3111,128,6,0,2,1.76,94.36,1.86,3250,47,1.45,South-East Asia
168,Timor-Leste,24,0,0,24,0,0,0,0.0,0.0,0.0,24,0,0.0,South-East Asia


### Pandas 활용4. 행을 기준으로 데이터 접근하기

In [149]:
# 예시 데이터 - 도서관 정보
books_dict = {"Available": [True, True, False], 
              "Location": [102, 215, 323], 
              "Genre": ["Programming", "Physics", "Math"]}

books_df = pd.DataFrame(books_dict, index=["버그란 무엇인가", "두근두근 물리학", "미분해줘 홈즈"])

books_df

Unnamed: 0,Available,Location,Genre
버그란 무엇인가,True,102,Programming
두근두근 물리학,True,215,Physics
미분해줘 홈즈,False,323,Math


### 인덱스를 이용해서 가져오기: `.loc[row, col]`

In [171]:
books_df.loc["버그란 무엇인가"]

Available           True
Location             102
Genre        Programming
Name: 버그란 무엇인가, dtype: object

In [179]:
# "미분해줘 홈즈" 책이 대출 가능한지(두 경우 동일하다)
books_df.loc["미분해줘 홈즈"]["Available"]
books_df.loc["미분해줘 홈즈", "Available"]

False

Unnamed: 0,Available,Location,Genre
두근두근 물리학,True,215,Physics
미분해줘 홈즈,False,323,Math


In [211]:
### number index를 이용해서 가져오기: '.iloc[rowidx, colidx]'
print(books_df.iloc[0, 1]); print()
print(books_df[0:4][1:]); print()
""" books_df[0:4, 1:2]는 불가능 """


books_df.iloc[[0, 2], 2:3]
books_df.iloc[0: 2, 2:3]

102

          Available  Location    Genre
두근두근 물리학       True       215  Physics
미분해줘 홈즈       False       323     Math



Unnamed: 0,Genre
버그란 무엇인가,Programming
두근두근 물리학,Physics


### Pandas 활용 5. groupby

- Split: 특정한 "기준"을 바탕으로 DataFrame을 분할
- Apply: 통계함수 __sum(), mean(), median()__ 을 적용해서 각 데이터를 압축
- Combine: Apply한 결과를 바탕으로 새로운 Series를 생성 (group_key: applied_value)

- `.groupby()`

In [216]:
# WHO Region 별 확진자수

# 1. covid에서 확진자 수 column만 추출한다.
# 2. 이를 covid의 WHO Region을 기준으로 groupby한다.

covid_by_region = covid["Confirmed"].groupby(by=covid["WHO Region"])
covid_by_region

<pandas.core.groupby.generic.SeriesGroupBy object at 0x0F959B38>

In [220]:
covid_by_region.sum()

WHO Region
Africa                    723207
Americas                 8839286
Eastern Mediterranean    1490744
Europe                   3299523
South-East Asia          1835297
Western Pacific           292428
Name: Confirmed, dtype: int64

In [243]:
# 국가별 감염자 수
np.round(covid_by_region.mean(), 2)  # sum() / 국가 수(각 group당 groupby된 개체 개수)

WHO Region
Africa                    15066.81
Americas                 252551.03
Eastern Mediterranean     67761.09
Europe                    58920.05
South-East Asia          183529.70
Western Pacific           18276.75
Name: Confirmed, dtype: float64

## Mission:
### 1. covid 데이터에서 100 case 대비 사망률(`Deaths / 100 Cases`)이 가장 높은 국가는?

In [279]:
# 데이터 확인 1

covid.sort_values(by="Deaths / 100 Cases")

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
55,Eritrea,265,0,191,74,2,0,2,0.00,72.08,0.00,251,14,5.58,Africa
114,Mongolia,289,0,222,67,1,0,4,0.00,76.82,0.00,287,2,0.70,Western Pacific
30,Cambodia,226,0,147,79,1,0,4,0.00,65.04,0.00,171,55,32.16,Western Pacific
19,Bhutan,99,0,86,13,4,0,1,0.00,86.87,0.00,90,9,10.00,South-East Asia
141,Saint Lucia,24,0,22,2,0,0,0,0.00,91.67,0.00,23,1,4.35,Americas
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,France,220352,30212,81212,108928,2551,17,267,13.71,36.86,37.20,214023,6329,2.96,Europe
85,Italy,246286,35112,198593,12581,168,5,147,14.26,80.64,17.68,244624,1662,0.68,Europe
16,Belgium,66428,9822,17452,39154,402,1,14,14.79,26.27,56.28,64094,2334,3.64,Europe
177,United Kingdom,301708,45844,1437,254427,688,7,3,15.19,0.48,3190.26,296944,4764,1.60,Europe


In [308]:
# 풀이1

covid[covid["Deaths / 100 Cases"] == covid["Deaths / 100 Cases"].max()]

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
184,Yemen,1691,483,833,375,10,4,36,28.56,49.26,57.98,1619,72,4.45,Eastern Mediterranean


In [344]:
# 풀이2
print(covid["Country/Region"][covid["Deaths / 100 Cases"] == covid["Deaths / 100 Cases"].max()]); print()

184    Yemen
Name: Country/Region, dtype: object



In [346]:
# 풀이3
a = covid.loc[:, "Country/Region"][covid["Deaths / 100 Cases"] == covid["Deaths / 100 Cases"].max()]
list(a)[0]

'Yemen'

### 2. covid 데이터에서 신규 확진자가 없는 나라 중 WHO Region이 'Europe'를 모두 출력하면?  
Hint : 한 줄에 동시에 두가지 조건을 Apply하는 경우 Warning이 발생할 수 있습니다.

In [361]:
# 풀이1: index만으로 경고 뜨지만 시도
covid[covid["New cases"] == 0][covid["WHO Region"] == "Europe"]

  covid[covid["New cases"] == 0][covid["WHO Region"] == "Europe"]


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
56,Estonia,2034,69,1923,42,0,0,1,3.39,94.54,3.59,2021,13,0.64,Europe
75,Holy See,12,0,12,0,0,0,0,0.0,100.0,0.0,12,0,0.0,Europe
95,Latvia,1219,31,1045,143,0,0,0,2.54,85.73,2.97,1192,27,2.27,Europe
100,Liechtenstein,86,1,81,4,0,0,0,1.16,94.19,1.23,86,0,0.0,Europe
113,Monaco,116,4,104,8,0,0,0,3.45,89.66,3.85,109,7,6.42,Europe
143,San Marino,699,42,657,0,0,0,0,6.01,93.99,6.39,699,0,0.0,Europe
157,Spain,272421,28432,150376,93613,0,0,0,10.44,55.2,18.91,264836,7585,2.86,Europe


In [386]:
# 풀이2(.loc을 쓰고 동시에 쓰면 UserWarning 발생)
covid_Europe = covid.loc[covid["WHO Region"] == "Europe"][covid["New cases"] == 0]

covid_no_Europe = covid_Europe.loc[covid["New cases"] == 0]

covid_no_Europe

  covid_Europe = covid.loc[covid["WHO Region"] == "Europe"][covid["New cases"] == 0]


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
56,Estonia,2034,69,1923,42,0,0,1,3.39,94.54,3.59,2021,13,0.64,Europe
75,Holy See,12,0,12,0,0,0,0,0.0,100.0,0.0,12,0,0.0,Europe
95,Latvia,1219,31,1045,143,0,0,0,2.54,85.73,2.97,1192,27,2.27,Europe
100,Liechtenstein,86,1,81,4,0,0,0,1.16,94.19,1.23,86,0,0.0,Europe
113,Monaco,116,4,104,8,0,0,0,3.45,89.66,3.85,109,7,6.42,Europe
143,San Marino,699,42,657,0,0,0,0,6.01,93.99,6.39,699,0,0.0,Europe
157,Spain,272421,28432,150376,93613,0,0,0,10.44,55.2,18.91,264836,7585,2.86,Europe


In [385]:
# 풀이3(.loc을 사용하고 조건을 나누어서 쓰면 warning 없이 통과함)
covid_Europe = covid[covid["WHO Region"] == "Europe"]

covid_no_Europe = covid_Europe.loc[covid["New cases"] == 0]

print(list(covid_no_Europe["Country/Region"]))

covid_no_Europe

['Estonia', 'Holy See', 'Latvia', 'Liechtenstein', 'Monaco', 'San Marino', 'Spain']


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
56,Estonia,2034,69,1923,42,0,0,1,3.39,94.54,3.59,2021,13,0.64,Europe
75,Holy See,12,0,12,0,0,0,0,0.0,100.0,0.0,12,0,0.0,Europe
95,Latvia,1219,31,1045,143,0,0,0,2.54,85.73,2.97,1192,27,2.27,Europe
100,Liechtenstein,86,1,81,4,0,0,0,1.16,94.19,1.23,86,0,0.0,Europe
113,Monaco,116,4,104,8,0,0,0,3.45,89.66,3.85,109,7,6.42,Europe
143,San Marino,699,42,657,0,0,0,0,6.01,93.99,6.39,699,0,0.0,Europe
157,Spain,272421,28432,150376,93613,0,0,0,10.44,55.2,18.91,264836,7585,2.86,Europe


### 3. 다음 [데이터](https://www.kaggle.com/neuromusic/avocado-prices)를 이용해 각 Region별로 아보카도가 가장 비싼 평균가격(AveragePrice)을 출력하면?

In [388]:
# 데이터 확인
pd.read_csv("avocado.csv")

Unnamed: 0.1,Unnamed: 0,Date,AveragePrice,Total Volume,4046,4225,4770,Total Bags,Small Bags,Large Bags,XLarge Bags,type,year,region
0,0,2015-12-27,1.33,64236.62,1036.74,54454.85,48.16,8696.87,8603.62,93.25,0.0,conventional,2015,Albany
1,1,2015-12-20,1.35,54876.98,674.28,44638.81,58.33,9505.56,9408.07,97.49,0.0,conventional,2015,Albany
2,2,2015-12-13,0.93,118220.22,794.70,109149.67,130.50,8145.35,8042.21,103.14,0.0,conventional,2015,Albany
3,3,2015-12-06,1.08,78992.15,1132.00,71976.41,72.58,5811.16,5677.40,133.76,0.0,conventional,2015,Albany
4,4,2015-11-29,1.28,51039.60,941.48,43838.39,75.78,6183.95,5986.26,197.69,0.0,conventional,2015,Albany
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18244,7,2018-02-04,1.63,17074.83,2046.96,1529.20,0.00,13498.67,13066.82,431.85,0.0,organic,2018,WestTexNewMexico
18245,8,2018-01-28,1.71,13888.04,1191.70,3431.50,0.00,9264.84,8940.04,324.80,0.0,organic,2018,WestTexNewMexico
18246,9,2018-01-21,1.87,13766.76,1191.92,2452.79,727.94,9394.11,9351.80,42.31,0.0,organic,2018,WestTexNewMexico
18247,10,2018-01-14,1.93,16205.22,1527.63,2981.04,727.01,10969.54,10919.54,50.00,0.0,organic,2018,WestTexNewMexico


In [405]:
# 풀이
data = pd.read_csv("avocado.csv")
data_region = data["AveragePrice"].groupby(by=data["region"])
data_region.max()

region
Albany                 2.13
Atlanta                2.75
BaltimoreWashington    2.28
Boise                  2.79
Boston                 2.19
BuffaloRochester       2.57
California             2.58
Charlotte              2.83
Chicago                2.30
CincinnatiDayton       2.20
Columbus               2.22
DallasFtWorth          1.90
Denver                 2.16
Detroit                2.08
GrandRapids            2.73
GreatLakes             1.98
HarrisburgScranton     2.27
HartfordSpringfield    2.68
Houston                1.92
Indianapolis           2.10
Jacksonville           2.99
LasVegas               3.03
LosAngeles             2.44
Louisville             2.29
MiamiFtLauderdale      3.05
Midsouth               2.17
Nashville              2.24
NewOrleansMobile       2.32
NewYork                2.65
Northeast              2.31
NorthernNewEngland     1.96
Orlando                2.87
Philadelphia           2.45
PhoenixTucson          2.62
Pittsburgh             1.83
Plains       

In [409]:
# 정답인지 확인
data[data["region"] == "Albany"].sort_values(by="AveragePrice")

Unnamed: 0.1,Unnamed: 0,Date,AveragePrice,Total Volume,4046,4225,4770,Total Bags,Small Bags,Large Bags,XLarge Bags,type,year,region
2846,38,2016-04-03,0.85,81694.23,676.27,70459.66,31.20,10527.10,10058.25,468.85,0.0,conventional,2016,Albany
2,2,2015-12-13,0.93,118220.22,794.70,109149.67,130.50,8145.35,8042.21,103.14,0.0,conventional,2015,Albany
7,7,2015-11-08,0.98,109428.33,703.75,101815.36,80.00,6829.22,6266.85,562.37,0.0,conventional,2015,Albany
43,43,2015-03-01,0.99,55595.74,629.46,45633.34,181.49,9151.45,8986.06,165.39,0.0,conventional,2015,Albany
46,46,2015-02-08,0.99,51253.97,1357.37,39111.81,163.25,10621.54,10113.10,508.44,0.0,conventional,2015,Albany
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9151,25,2015-07-05,2.04,1573.19,50.69,183.90,0.00,1338.60,1338.60,0.00,0.0,organic,2015,Albany
14770,29,2017-06-11,2.04,2719.24,21.33,248.87,0.00,2449.04,2449.04,0.00,0.0,organic,2017,Albany
9149,23,2015-07-19,2.08,1076.23,50.86,112.36,0.00,913.01,913.01,0.00,0.0,organic,2015,Albany
9153,27,2015-06-21,2.09,1053.73,17.59,107.87,0.00,928.27,928.27,0.00,0.0,organic,2015,Albany
