### 직방 서비스 원룸 데이터 수집
- 복잡한 데이터 수집 방법
- 동 이름 입력 > 매물 정보(데이터프레임)

- 절차
    - 동 이름 > 위도, 경도
    - 위도, 경도 > geohash code(지역 범위)
    - geohash > 매물 아이디
    - 매물 아이디 > 매물 정보

In [1]:
import pandas as pd
import requests

In [54]:
# 1. 동 이름 > 위도, 경도
query = "마포구 합정동"
url = f"https://apis.zigbang.com/v2/search?leaseYn=N&q={query}&serviceType=원룸"
response = requests.get(url)
data = response.json()['items'][0]
lat, lng = data["lat"], data["lng"]
lat, lng

(37.549537658691406, 126.90560913085938)

In [16]:
# 2. 위도, 경도 > geohash
# !pip install geohash2
import geohash2

# precision: 영역 - 값이 커질수록 영역이 작아짐
code = geohash2.encode(lat, lng, precision=5)
code

'wydjx'

In [18]:
# 3. geohash > 매물 아이디
url = f"https://apis.zigbang.com/v2/items?deposit_gteq=0&domain=zigbang&geohash={code}&needHasNoFiltered=true&rent_gteq=0&sales_type_in=전세|월세&service_type_eq=원룸"
response = requests.get(url)
response

<Response [200]>

In [20]:
data = response.json()["items"]
data[:3]

[{'lat': 37.52960683380756, 'lng': 126.89761319483509, 'item_id': 30462357},
 {'lat': 37.52958649415252, 'lng': 126.89698378323035, 'item_id': 30524930},
 {'lat': 37.52911073238503, 'lng': 126.89760543789517, 'item_id': 30629925}]

In [23]:
# list comprehension - 간단한 리스트 데이터를 만들 때 사용
ids = [data["item_id"] for data in data]
ids[:3]

[30462357, 30524930, 30629925]

In [27]:
# 홀수 데이터만 제곱해서 리스트 만들어 출력
result = [data**2 for data in range(10) if data % 2]
result

[1, 9, 25, 49, 81]

In [28]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [31]:
# 4. 매물 아이디 > 매물 정보
url = "https://apis.zigbang.com/v2/items/list"
params = {"domain": "zigbang",
          "withCoalition": "true",
          "item_ids": ids}
response = requests.post(url, params)
response

<Response [200]>

In [55]:
items = response.json()["items"]
item_df = pd.DataFrame(items)
item_df.tail(2)

Unnamed: 0,id,type,name,hint,description,lat,lng,zoom,polygon,_score,_source,zoom_level,zoom_level_v2
0,4021,address,합정동,,서울시 마포구 합정동,37.549538,126.905609,5,[],,"{'name_length': 3, 'local1': '서울시', 'local2': ...","{'google': 15, 'daum': 4}","{'app': 5, 'web': 4}"


In [36]:
item_df

Unnamed: 0,section_type,item_id,images_thumbnail,sales_type,sales_title,deposit,rent,size_m2,공급면적,전용면적,...,is_zzim,status,service_type,tags,address1,address2,address3,manage_cost,reg_date,is_new
0,,30462357,https://ic.zigbang.com/ic/items/30462357/1.jpg,전세,전세,8000,0,19.83,"{'m2': 19.83, 'p': '6'}","{'m2': 19.83, 'p': '6'}",...,False,True,원룸,[],서울시 영등포구 당산동4가,,,13,2022-02-18T17:43:12+09:00,False
1,,30524930,https://ic.zigbang.com/ic/items/30524930/1.jpg,전세,전세,7000,0,16.53,"{'m2': 16.53, 'p': '5'}","{'m2': 16.53, 'p': '5'}",...,False,True,원룸,[추천],서울시 영등포구 당산동4가,,,13,2022-02-07T16:03:16+09:00,False
2,,30629925,https://ic.zigbang.com/ic/items/30629925/1.jpg,전세,전세,7000,0,19.83,"{'m2': 19.83, 'p': '6'}","{'m2': 19.83, 'p': '6'}",...,False,True,원룸,[추천],서울시 영등포구 당산동4가,,,12,2022-02-19T16:52:02+09:00,True
3,,30681505,https://ic.zigbang.com/ic/items/30681505/1.jpg,전세,전세,7000,0,19.83,"{'m2': 19.83, 'p': '6'}","{'m2': 19.83, 'p': '6'}",...,False,True,원룸,[추천],서울시 영등포구 당산동4가,,,13,2022-02-17T15:58:52+09:00,False
4,,30494154,https://ic.zigbang.com/ic/items/30494154/1.jpg,전세,전세,6000,0,19.83,"{'m2': 19.83, 'p': '6'}","{'m2': 19.83, 'p': '6'}",...,False,True,원룸,[],서울시 영등포구 양평동3가,,,12,2022-02-05T11:14:17+09:00,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
911,,30653716,https://ic.zigbang.com/ic/items/30653716/1.jpg,월세,월세,4000,27,15.00,"{'m2': 15, 'p': '4.5'}","{'m2': 15, 'p': '4.5'}",...,False,True,원룸,[],서울시 마포구 성산동,,,8,2022-02-19T12:13:37+09:00,False
912,,30673351,https://ic.zigbang.com/ic/items/30673351/1.jpg,월세,월세,9000,20,19.83,"{'m2': 19.83, 'p': '6'}","{'m2': 19.83, 'p': '6'}",...,False,True,원룸,[추천],서울시 마포구 성산동,,,8,2022-02-17T10:27:46+09:00,False
913,,30629059,https://ic.zigbang.com/ic/items/30629059/1.jpg,전세,전세,26000,0,65.00,"{'m2': 65, 'p': '19.7'}","{'m2': 60, 'p': '18.2'}",...,False,True,빌라,[],서울시 마포구 성산동,,,0,2022-02-14T14:50:13+09:00,False
914,,30455640,https://ic.zigbang.com/ic/items/30455640/1.jpg,전세,전세,28000,0,33.58,"{'m2': 33.58, 'p': '10.2'}","{'m2': 23.98, 'p': '7.3'}",...,False,True,빌라,[추천],서울시 마포구 중동,,,3,2022-02-03T12:00:40+09:00,False


In [40]:
pd.options.display.max_columns # = 30
# pd.options.display.max_rows = 60 이상

20

In [41]:
item_df.columns

Index(['section_type', 'item_id', 'images_thumbnail', 'sales_type',
       'sales_title', 'deposit', 'rent', 'size_m2', '공급면적', '전용면적', '계약면적',
       'room_type_title', 'floor', 'floor_string', 'building_floor', 'title',
       'is_first_movein', 'room_type', 'address', 'random_location', 'is_zzim',
       'status', 'service_type', 'tags', 'address1', 'address2', 'address3',
       'manage_cost', 'reg_date', 'is_new'],
      dtype='object')

In [43]:
columns = [
    "item_id", "sales_title", "deposit", "rent", "size_m2",
    "floor", "building_floor", "title", "room_type", "address",
    "service_type", "address1", "manage_cost", "reg_date", "is_new"]

In [48]:
query

'마포구 합정동'

In [47]:
result_df = item_df[columns]
result_df = result_df[result_df["address"].str.contains(query)]
result_df = result_df.reset_index(drop=True)
result_df.head()

Unnamed: 0,item_id,sales_title,deposit,rent,size_m2,floor,building_floor,title,room_type,address,service_type,address1,manage_cost,reg_date,is_new
0,30633740,전세,22500,0,36.0,반지하,3,올수리중 풀옵션 투룸,4,마포구 합정동,빌라,서울시 마포구 합정동,0,2022-02-14T17:17:39+09:00,False
1,30628139,전세,22000,0,82.23,1,3,💕신축리모델링💕귀한넓은투룸💕합정초역세권💕보면반해💕,4,마포구 합정동,빌라,서울시 마포구 합정동,2,2022-02-14T14:25:32+09:00,False
2,30633429,전세,22000,0,39.67,1,3,💥합정대출가능💖금방나갈집💖주차가능🌈채광맛집👍,4,마포구 합정동,빌라,서울시 마포구 합정동,2,2022-02-14T17:07:22+09:00,False
3,30647837,전세,22000,0,39.67,1,3,대출OK🎉합정역 넘 예쁜 투룸 전세🎉,4,마포구 합정동,빌라,서울시 마포구 합정동,2,2022-02-19T12:53:08+09:00,False
4,30650042,전세,22000,0,46.28,1,3,🎶 [합정역 10분] 🎶[대출 가능],4,마포구 합정동,빌라,서울시 마포구 합정동,2,2022-02-15T16:19:47+09:00,False


In [50]:
# 보증금 1억 이하, 월세 100만원 이하
result_df[(result_df["deposit"] <= 10000) & (result_df["rent"] < 100)]

Unnamed: 0,item_id,sales_title,deposit,rent,size_m2,floor,building_floor,title,room_type,address,service_type,address1,manage_cost,reg_date,is_new
45,30546561,월세,1000,85,33.06,2,3,🎉조용한 주택가 투룸🎉,4,마포구 합정동,빌라,서울시 마포구 합정동,5.0,2022-02-09T15:26:34+09:00,False
46,30603311,월세,1000,79,39.67,2,3,💥한강앞! 합정 투룸💥갓성비 안방넓은 깔끔한 집💥,4,마포구 합정동,빌라,서울시 마포구 합정동,11.2,2022-02-14T14:16:50+09:00,False
47,30662028,월세,1000,85,36.36,2,3,🌈🌈채광좋은 화이트톤 투룸!,4,마포구 합정동,빌라,서울시 마포구 합정동,5.0,2022-02-19T18:58:44+09:00,True
48,30698799,월세,1000,85,35.2,2,3,🎶 [한강산책 8분거리] 🎶[귀한 투룸],4,마포구 합정동,빌라,서울시 마포구 합정동,5.0,2022-02-21T14:48:48+09:00,True
50,30682604,월세,300,45,16.53,3,5,🎉합정 풀옵션 원룸🎉🎉,1,마포구 합정동,원룸,서울시 마포구 합정동,3.0,2022-02-21T14:34:55+09:00,True
51,30501645,월세,300,45,19.83,3,3,⭕합정동 채광좋은원룸 YG사옥인근⭕,1,마포구 합정동,원룸,서울시 마포구 합정동,5.0,2022-02-21T14:29:57+09:00,True
68,30662860,월세,5000,70,59.5,2,3,큰방2개 별도거실 깨끗함,4,마포구 합정동,빌라,서울시 마포구 합정동,0.0,2022-02-16T13:53:22+09:00,False
71,30687260,월세,3000,90,32.63,3,5,🌸반려동물가능🌸가성비투룸🌸현재짐다뺌🌸,4,마포구 합정동,빌라,서울시 마포구 합정동,5.0,2022-02-17T23:42:38+09:00,False
72,30552358,월세,1000,45,16.53,2,5,🌸극 가성비🌸즉시입주도가능🌸이가격에 이방🌸,1,마포구 합정동,원룸,서울시 마포구 합정동,5.0,2022-02-09T11:07:47+09:00,False


In [63]:
def oneroom(address):
    url = f"https://apis.zigbang.com/v2/search?leaseYn=N&q={address}&serviceType=원룸"
    response = requests.get(url)
    data = response.json()["items"][0]
    lat, lng = data["lat"], data["lng"]
    code = geohash2.encode(lat, lng, precision=5)
    
    url = f"https://apis.zigbang.com/v2/items?deposit_gteq=0&domain=zigbang\
&geohash={code}&needHasNoFiltered=true&rent_gteq=0&sales_type_in=전세|월세&service_type_eq=원룸"
    response = requests.get(url)
    data = response.json()["items"]
    ids = [data["item_id"] for data in data]
    
    url = "https://apis.zigbang.com/v2/items/list"
    params = {
        "domain": "zigbang",
        "withCoalition": "true",
        "item_ids": ids[:998],
    }
    response = requests.post(url, params)
    items = response.json()["items"]
    item_df = pd.DataFrame(items)
    
    columns = [
        'item_id', 'sales_title', 'deposit', 'rent', 'size_m2',
        'floor', 'building_floor', 'title', 'room_type', 'address', 
        'service_type', 'address1', 'manage_cost', 'reg_date', 'is_new',
    ]
    
    result_df = item_df[columns]
    result_df = result_df[result_df["address"].str.contains(address)]
    return result_df.reset_index(drop=True)

In [65]:
oneroom("망원동").head()

Unnamed: 0,item_id,sales_title,deposit,rent,size_m2,floor,building_floor,title,room_type,address,service_type,address1,manage_cost,reg_date,is_new
0,30483493,전세,23000,0,18.99,6,6,"🦋합정역,신축1.5룸가성비짱,한강공원도보,보증보험OK",2,마포구 망원동,원룸,서울시 마포구 망원동,5,2022-02-04T15:40:23+09:00,False
1,30608827,전세,23000,0,42.98,6,6,방1개 별도거실 주방 화장실 다용도실 분리형원룸 입니다,4,마포구 망원동,빌라,서울시 마포구 망원동,7,2022-02-12T16:27:27+09:00,False
2,30578522,월세,12000,10,12.21,3,5,💥LH 가능 깨끗한 방💥,1,마포구 망원동,원룸,서울시 마포구 망원동,5,2022-02-15T18:54:14+09:00,False
3,30726074,월세,12000,10,13.22,3,5,🎶 [LH 가능!] 🎶 [깔끔한 방!],1,마포구 망원동,원룸,서울시 마포구 망원동,5,2022-02-21T11:45:06+09:00,True
4,30340227,월세,1000,49,23.14,3,5,⭕망원역세권 7평원룸 내부깔끔⭕,1,마포구 망원동,원룸,서울시 마포구 망원동,3,2022-01-22T14:38:50+09:00,False
