## 기업 개황(Overview): 여러 회사
+ 작성: 임경호
+ 금융감독원 OPEN DART https://opendart.fss.or.kr/

In [8]:
import pandas as pd
import datetime
import os

# 파일 저장 위치
path_dir = "D:/PythonProject/data-gatherer/dart_fs_notes/company/"
file_name = "corp_codes_all.csv"
df_corp = pd.read_csv(path_dir + file_name, dtype=object)

In [9]:
# 현재 시각: 년월일_시분초
def now_dt_str():
    now = datetime.datetime.now()
    dt = now.strftime('%Y%m%d_%H%M%S')
    return dt

In [10]:
list_all_codes = df_corp['corp_code'].unique()
len(list_all_codes)

103859

In [11]:
file_list = os.listdir(path_dir)

df_own_corp_info = pd.DataFrame()
list_own_codes = []   
for file_name in file_list:
    # 기업 개황 정보를 가져온 파일이 있을 경우
    if 'corp_info_' in file_name:
        df = pd.read_csv(path_dir + file_name, dtype=object)
        df_own_corp_info = pd.concat([df_own_corp_info, df])

if not df_own_corp_info.empty:
    list_own_codes = df_own_corp_info['corp_code'].unique()    

len(list_own_codes)

50697

In [12]:
# 리스트에서 중복 제거 (이미 corp info가 있는 경우 대상에서 제외)
list_target_codes = list(set(list_all_codes) - set(list_own_codes))
len(list_target_codes)

53163

* corp_code 분할

In [13]:
list_cnt = len(list_target_codes)
n = 1000
list_of_lists = [list_target_codes[i * n:(i + 1) * n] for i in range((list_cnt + n - 1) // n )] 
len(list_of_lists)

54

* 기업 개황 정보 가져오기

In [7]:
from tqdm import tqdm
from time import sleep
import requests

# Company Info   
url = 'https://opendart.fss.or.kr/api/company.json'
api_key = 'f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2'

for corp_codes in list_of_lists:
    corp_list = []
    for corp_code in tqdm(corp_codes):
        sleep(0.1)
        params = {
            'corp_code': corp_code,
            'crtfc_key': api_key,
        }
        try:
            response = requests.get(url, params=params)     
            if response.status_code == 200:     # URL GET '200 정상'
                json_data = response.json()
                if json_data['status'] == '020':    # 사용한도 초과
                    print(json_data['message'])
                    raise Exception
                else:
                    corp_list.append(json_data)
            else:
                print("URL GET Error", corp_code)
                break
        except Exception as e:
            print(e)
            break
    if len(corp_list) == 0:
        break
    else:
        # 데이터프레임 형태로 변환
        df_save_corp = pd.DataFrame(corp_list)
        # csv 파일로 저장
        df_save_corp = df_save_corp.astype({'corp_code' : 'string', 'stock_code' : 'string'})
        file_path = path_dir + 'corp_info_' + now_dt_str() + '.csv'
        df_save_corp.to_csv(file_path, index=False)

100%|██████████| 1000/1000 [04:28<00:00,  3.72it/s]
100%|██████████| 1000/1000 [04:28<00:00,  3.73it/s]
100%|██████████| 1000/1000 [04:33<00:00,  3.66it/s]
100%|██████████| 1000/1000 [04:31<00:00,  3.68it/s]
 45%|████▌     | 454/1000 [02:04<02:29,  3.64it/s]


HTTPSConnectionPool(host='opendart.fss.or.kr', port=443): Max retries exceeded with url: /api/company.json?corp_code=01142419&crtfc_key=f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))


100%|██████████| 1000/1000 [04:26<00:00,  3.75it/s]
 88%|████████▊ | 884/1000 [03:55<00:30,  3.75it/s]


HTTPSConnectionPool(host='opendart.fss.or.kr', port=443): Max retries exceeded with url: /api/company.json?corp_code=00925888&crtfc_key=f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))


100%|██████████| 1000/1000 [04:22<00:00,  3.81it/s]
100%|██████████| 1000/1000 [04:29<00:00,  3.71it/s]
100%|██████████| 1000/1000 [04:29<00:00,  3.71it/s]
100%|██████████| 1000/1000 [04:28<00:00,  3.72it/s]
100%|██████████| 1000/1000 [04:30<00:00,  3.70it/s]
100%|██████████| 1000/1000 [04:27<00:00,  3.73it/s]
 36%|███▌      | 357/1000 [01:58<03:33,  3.01it/s]


HTTPSConnectionPool(host='opendart.fss.or.kr', port=443): Max retries exceeded with url: /api/company.json?corp_code=00908012&crtfc_key=f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002CB78548340>, 'Connection to opendart.fss.or.kr timed out. (connect timeout=None)'))


 73%|███████▎  | 733/1000 [03:16<01:11,  3.72it/s]


HTTPSConnectionPool(host='opendart.fss.or.kr', port=443): Max retries exceeded with url: /api/company.json?corp_code=01174861&crtfc_key=f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))


 90%|█████████ | 901/1000 [03:58<00:26,  3.78it/s]


HTTPSConnectionPool(host='opendart.fss.or.kr', port=443): Max retries exceeded with url: /api/company.json?corp_code=00357157&crtfc_key=f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))


  1%|          | 9/1000 [00:02<04:27,  3.70it/s]


HTTPSConnectionPool(host='opendart.fss.or.kr', port=443): Max retries exceeded with url: /api/company.json?corp_code=01457010&crtfc_key=f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))


100%|██████████| 1000/1000 [04:26<00:00,  3.75it/s]
100%|██████████| 1000/1000 [04:25<00:00,  3.77it/s]
100%|██████████| 1000/1000 [04:27<00:00,  3.73it/s]
100%|██████████| 1000/1000 [04:28<00:00,  3.72it/s]
100%|██████████| 1000/1000 [04:35<00:00,  3.63it/s]
 58%|█████▊    | 577/1000 [02:44<02:00,  3.50it/s]


HTTPSConnectionPool(host='opendart.fss.or.kr', port=443): Max retries exceeded with url: /api/company.json?corp_code=01100033&crtfc_key=f2e08d4ed3de0ba3d5cbf59c04c223e02b1751a2 (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))


  8%|▊         | 84/1000 [00:24<04:30,  3.38it/s]


사용한도를 초과하였습니다.



  0%|          | 0/1000 [00:00<?, ?it/s]

사용한도를 초과하였습니다.




