# DART Data Importer 

DART Open API + Recent Announcements DB를 통해 기업의 공시자료를 분단위 공시시간과 함께 가져온다. 

In [9]:
# -*- coding: utf-8 -*-

from urllib.request import urlopen

import pandas as pd
import numpy as np
import pickle

import json
import xml.etree.ElementTree as elemTree
import sys

## DART Open API와 연결

DART API에는 4가지 정보가 있고, 각 정보는 더 세부적으로 나뉜다. 
1. 공시정보
2. 사업보고서 주요정보
3. 상장기업 재무정보
4. 지분공시 종합정보

In [2]:
with open('./DART_password.txt', 'r') as f:
    API_KEY = f.read()

In [3]:
crtfc_key = '?crtfc_key=' + API_KEY

1. 공시정보: 
    - 공시검색: 공시 유형별, 회사별, 날짜별 등 여러가지 조건으로 공시보고서 검색기능을 제공합니다.
    - 기업개황: DART에 등록되어있는 기업의 개황정보를 제공합니다.
    - 공시서류원본파일: 공시보고서 원본파일을 제공합니다.
    - 고유번호: DART에 등록되어있는 공시대상회사의 고유번호,회사명,대표자명,종목코드, 최근변경일자를 파일로 제공합니다.

기타 세부사항은 API doc에서 확인: https://opendart.fss.or.kr/guide/detail.do?apiGrpCd=DS001&apiId=2019001

In [4]:
## 공시정보 base URLs

DART_list_json = 'https://opendart.fss.or.kr/api/list.json' # 공시검색
DART_company_json = 'https://opendart.fss.or.kr/api/company.json' # 기업개황
DART_document_xml = 'https://opendart.fss.or.kr/api/document.xml' # 공시서류원본파일
DART_corpCode_xml = 'https://opendart.fss.or.kr/api/corpCode.xml' # 고유번호

In [5]:
def DART_annc_info(info_type, **kwargs):
    """Create a request url that includes given parameters. 
    
    Args:
        info_type (str): Type of info to request
        
    Kwargs:
        Too many. Refer to the API doc link above. 
        
    Returns:
        str.
        A complete url to hand over to Open DART API. 
    
    """
    parameters = ''
    for k, v in kwargs.items():
        parameters += '&' + str(k) + '=' + str(v)
    
    if info_type == 'list':
        return DART_list_json + crtfc_key + parameters
    elif info_type == 'company':
        return DART_company_json + crtfc_key + parameters
    elif info_type == 'document':
        return DART_document_xml + crtfc_key + parameters
    elif info_type == 'corpCode':
        return DART_corpCode_xml + crtfc_key + parameters
    else:
        print('Wrong info_type. Choose from:')
        print('''
        1. "list": 공시검색
        2. "company": 기업개활
        3. "document": 공시서류원본파일
        4. "corpCode": 고유번호
        ''')

In [6]:
def DART_get_response(request_url):
    """Get response from Open DART API. 
    
    Args: 
        request_url (str): The url to request. 
        
    Returns:
        tuple.
        (
            (str) type of the object,
            xml or json object
        )
        
    
    """
    req = urlopen(request_url)
    response = req.read().decode('utf8')
    
    try:
        result = ('json', json.loads(response))
    except JSONDecodeError:
        result = ('xml', elemTree.fromstring(response))
    except:
        print("An error occurred: ", sys.exc_info()[0])
        return 0
        
    return result

In [7]:
req_url = DART_annc_info('list', corp_code='00919966', bgn_de='20130801', end_de='20150815')
req_url

'https://opendart.fss.or.kr/api/list.json?crtfc_key=407c1fe7fc7a1a183002c6d5f981408662cd879e&corp_code=00919966&bgn_de=20130801&end_de=20150815'

In [16]:
def DART_response2df(response):
    if response[0] == 'json':
        data = response[1]['list']
    
    return pd.DataFrame(data)

In [19]:
res = DART_get_response(req_url)
res_df = DART_response2df(res) 
res_df

Unnamed: 0,corp_code,corp_name,stock_code,corp_cls,report_nm,rcept_no,flr_nm,rcept_dt,rm
0,919966,신라젠,215600,K,분기보고서 (2015.03),20150601000841,신라젠,20150601,정
1,919966,신라젠,215600,K,주요사항보고서(중요한자산양수도결정),20150430001501,신라젠,20150430,
2,919966,신라젠,215600,K,[기재정정]사업보고서 (2014.12),20150423000246,신라젠,20150423,연
3,919966,신라젠,215600,K,[기재정정]사업보고서 (2014.12),20150420000169,신라젠,20150420,정연
4,919966,신라젠,215600,K,사업보고서 (2014.12),20150415000002,신라젠,20150415,정연
5,919966,신라젠,215600,K,연결감사보고서 (2014.12),20150414001753,삼일회계법인,20150414,
6,919966,신라젠,215600,K,감사보고서 (2014.12),20150408001181,삼일회계법인,20150408,
7,919966,신라젠,215600,K,[기재정정]감사보고서 (2013.12),20140327000924,남일회계법인,20140327,
8,919966,신라젠,215600,K,감사보고서 (2013.12),20140227000130,남일회계법인,20140227,정


## DART 최근공시 DB 불러오기 

In [10]:
all_anncs_df = pd.read_pickle('./all_anncs_df_2014.01.01-2020.04.04.pkl') # 20140101 ~ 20200404 with some missing dates. 
all_anncs_df

Unnamed: 0,corp_code,annc_title,annc_id,datetime
56,00351579,[첨부추가]단일판매ㆍ공급계약체결(자율공시),20130906900079,2014-01-02 09:38:00
55,00788773,기업설명회(IR)개최,20140102900026,2014-01-02 09:46:00
54,00359076,단일판매ㆍ공급계약체결(자율공시),20140102900031,2014-01-02 10:26:00
53,00524786,[기재정정]현금ㆍ현물배당결정,20140102900048,2014-01-02 10:58:00
52,00361381,기타경영사항(자율공시) (공급 계약 체결 진행사항 안내),20140102900046,2014-01-02 10:59:00
...,...,...,...,...
222531,00100601,기타시장안내 (상장폐지 관련 안내),20200403901244,2020-04-03 19:58:00
222530,00100601,주권매매거래정지기간변경 (상장폐지 사유 발생),20200403901247,2020-04-03 19:59:00
222529,00295857,[기재정정]감사보고서제출,20200403901249,2020-04-03 20:31:00
222528,00295857,기타시장안내 (상장적격성 실질심사관련 안내),20200403901241,2020-04-03 20:33:00


## Open API 응답 df와 최근공시DB 병합

Merge DART Open API response + All recent announcement DB 

In [23]:
merged = res_df.merge(all_anncs_df[['annc_id', 'datetime']], how='left', left_on='rcept_no', right_on='annc_id')
merged.drop(['rcept_dt', 'annc_id'], axis=1, inplace=True)
merged

Unnamed: 0,corp_code,corp_name,stock_code,corp_cls,report_nm,rcept_no,flr_nm,rm,datetime
0,919966,신라젠,215600,K,분기보고서 (2015.03),20150601000841,신라젠,정,2015-06-01 16:37:00
1,919966,신라젠,215600,K,주요사항보고서(중요한자산양수도결정),20150430001501,신라젠,,2015-04-30 16:47:00
2,919966,신라젠,215600,K,[기재정정]사업보고서 (2014.12),20150423000246,신라젠,연,2015-04-23 15:39:00
3,919966,신라젠,215600,K,[기재정정]사업보고서 (2014.12),20150420000169,신라젠,정연,2015-04-20 14:36:00
4,919966,신라젠,215600,K,사업보고서 (2014.12),20150415000002,신라젠,정연,2015-04-15 07:09:00
5,919966,신라젠,215600,K,연결감사보고서 (2014.12),20150414001753,삼일회계법인,,2015-04-14 16:36:00
6,919966,신라젠,215600,K,감사보고서 (2014.12),20150408001181,삼일회계법인,,2015-04-08 15:16:00
7,919966,신라젠,215600,K,[기재정정]감사보고서 (2013.12),20140327000924,남일회계법인,,2014-03-27 17:32:00
8,919966,신라젠,215600,K,감사보고서 (2013.12),20140227000130,남일회계법인,정,2014-02-27 11:41:00
