# Scraping Dynamic Web Pages
- Static webpages
  - A static site contains an HTML file for each page. The information on the page is delivered to the user exactly as it’s stored. All sites were built like this in the early days of the internet.
  - Now, this format is most often used to build sites where the content isn’t constantly changing. Scraping data from static pages is a straightforward process:

    1. Give the scraper the URL of the page you want to scrape.
    2. Identify the location of the data you want (This can be identified with the Inspect tool in Chrome.)
    3. Request the data using selectors.
    4. Export the data into a JSON or CSV file.
- Dynamic websites
  - Dynamic websites have continuously updating feeds, such as websites that deliver stock market data. These sites use Javascript and XML (AJAX) to update the page continuously without constant refreshing.
  - They do this by trading small data packets with the server on the back end. AJAX formatting makes scraping data more complicated since it has to be scraped each time it changes.

  - To scrape a dynamic page, you have to determine the format and destination of the server request so you can copy it and the response so you can extract it. In Chrome, you can identify the request using the following steps:
    1. With the Developer Tools panel open, click on Network to find all of the requests processed for the page.
    2. Under the Headers field, look for Form Data, which should contain the AJAX request.
    3. Find the parameters that designate the request and the endpoint.

  - You can find the response format by looking under the Response tab, which should be JSON or something similar. Now that you’ve identified the output parameter and response format, you can configure your web scraper.
  - You can scrape dynamic web pages, either
    1. automated browsers: simulate user action using local web browser driver (e.g. selenium, splash), or
    2. intercept AJAX calls: scrape the information source page directly

## AJAX (ASynchronous Javascript And XML)
- Have you ever visited a page that automatically loads extra content as you scroll? Then you’ve seen AJAX pages in action. Social media sites with “infinite scroll” are the most common examples of AJAX pages. Still, AJAX can be found on any site that presents dynamic and constantly updating content.
- 자바스크립트를 이용해 서버와 브라우저가 비동기 방식으로 데이터를 교환할 수 있는 통신기능
- 클라이언트와 서버간에 JSON 이나 XML 데이터를 주고받음.
- 비동기 방식을 이용하면 필요한 데이터만 불러오면서 리소스 낭비를 줄일 수 있다.
- AJAX는 XMLHttpRequest객체를 통해 서버에 request한다.
- JSON이나 XML형태로 필요한 데이터만 받아 갱신하기 때문에 그만큼의 자원과 시간을 아낄 수 있다.
- refer to: https://www.w3schools.com/js/js_ajax_intro.asp

## How to scrape AJAX website?
- Go to the page you want to scrape
- use F12 key to access “Developer Tools”
- go to the “Network” tab
- Scroll to the XHR(XMLHttpRequest) section, and refresh your screen if it’s empty
- Explore the different results until you find the one you want, then go to the “Headers” tab
- Scroll to the “Form Data” field (when it is a POST request.)

## URL convention
- Joinsland web page does not work any more, so we try another example.
- (ex) The URL: https://store.steampowered.com/search/results/?query&start=1&count=100&tags=1702
  - https:// indicates that the website is accessed over a secure HTTPS connection
  - store.steampowered.com: the domain name of the Steam Store website.
  - /search/results/ is the path that specifies the search results page.
  - ?query: indicates that there is a query parameter
  - &start=0: the starting position of the search results (set to 0)
  - &count=100: the number of search results to be displayed per page.
  - &tags=1702: represents a tag or category filter for the search results (The value 1702 corresponds to a specific tag or category within the Steam Store)


# Example 1: (페이지 넘기기) Naver 산업분석 리포트
- http://developer88.tistory.com/428how-to-scrape-an-ajax-website-using-python-qw8fuitvi
- 네이버의 산업분석리포트 scraping (페이지 1~ 3 까지)
- modified by jyj (2024-7-18)

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
url = 'https://finance.naver.com/research/industry_list.naver'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

In [3]:
response.status_code

200

In [4]:
# see page numbers (page navigation list)
pagenation = soup.find('table', class_='Nnavi')
print(pagenation)

<table align="center" class="Nnavi" summary="페이지 네비게이션 리스트">
<caption>페이지 네비게이션</caption>
<tr>
<td class="on">
<a href="/research/industry_list.naver?&amp;page=1">1</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=2">2</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=3">3</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=4">4</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=5">5</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=6">6</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=7">7</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=8">8</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=9">9</a>
</td>
<td>
<a href="/research/industry_list.naver?&amp;page=10">10</a>
</td>
<td class="pgR">
<a href="/research/industry_list.naver?&amp;page=11">
				다음<img alt="" border="0" height="5" src="https://ssl.pstatic.net/static/n/cmn/bu_pgarR.gif" width="3"/>
</a>
</td>
<td class="p

In [5]:
pages = pagenation.find_all('a')
pages

[<a href="/research/industry_list.naver?&amp;page=1">1</a>,
 <a href="/research/industry_list.naver?&amp;page=2">2</a>,
 <a href="/research/industry_list.naver?&amp;page=3">3</a>,
 <a href="/research/industry_list.naver?&amp;page=4">4</a>,
 <a href="/research/industry_list.naver?&amp;page=5">5</a>,
 <a href="/research/industry_list.naver?&amp;page=6">6</a>,
 <a href="/research/industry_list.naver?&amp;page=7">7</a>,
 <a href="/research/industry_list.naver?&amp;page=8">8</a>,
 <a href="/research/industry_list.naver?&amp;page=9">9</a>,
 <a href="/research/industry_list.naver?&amp;page=10">10</a>,
 <a href="/research/industry_list.naver?&amp;page=11">
 				다음<img alt="" border="0" height="5" src="https://ssl.pstatic.net/static/n/cmn/bu_pgarR.gif" width="3"/>
 </a>,
 <a href="/research/industry_list.naver?&amp;page=1150">맨뒤
 				<img alt="" border="0" height="5" src="https://ssl.pstatic.net/static/n/cmn/bu_pgarRR.gif" width="8"/>
 </a>]

In [6]:
pages[0]['href']

'/research/industry_list.naver?&page=1'

In [7]:
last_page = 3

for k in range(1, last_page+1):
    page_url = 'https://finance.naver.com' + pages[k-1]['href']
    print(page_url)

https://finance.naver.com/research/industry_list.naver?&page=1
https://finance.naver.com/research/industry_list.naver?&page=2
https://finance.naver.com/research/industry_list.naver?&page=3


In [8]:
page_url_0 = 'https://finance.naver.com/research/industry_list.naver?&page=1'
soup = BeautifulSoup(requests.get(page_url_0).text, 'html.parser')
table_body = soup.find('table', class_='type_1')
print(table_body)

<table cellpadding="0" cellspacing="0" class="type_1" summary="산업분석 리포트 게시판 글목록">
<caption>산업분석 리포트게시판</caption>
<col width="17%"/><col width="*%"/><col width="15%"/><col width="5%"/><col width="9%"/><col width="7%"/>
<tr>
<th>분류</th>
<th>제목</th>
<th style="text-align:left">증권사</th>
<th>첨부</th>
<th>작성일</th>
<th>조회수</th>
</tr>
<tr><td class="blank_07" colspan="6"></td></tr>
<tr>
<td style="padding-left:10">자동차</td>
<td><a href="industry_read.naver?nid=38161&amp;page=1">9월 미국 신차 판매</a></td>
<td>유진투자증권</td>
<td class="file"><a href="https://stock.pstatic.net/stock-research/industry/63/20241002_industry_358796000.pdf" target="_blank"><img align="absmiddle" alt="pdf" src="https://ssl.pstatic.net/imgstock/images5/down.gif"/></a></td>
<td class="date" style="padding-left:5px">24.10.02</td>
<td class="date">727</td>
</tr>
<tr>
<td style="padding-left:10">석유화학</td>
<td><a href="industry_read.naver?nid=38160&amp;page=1">OPEC 점유율 경쟁이 초래할 변화는?</a></td>
<td>유안타증권</td>
<td class="file"><a href="http

In [9]:
trs = table_body.find_all('tr')
trs[0]

<tr>
<th>분류</th>
<th>제목</th>
<th style="text-align:left">증권사</th>
<th>첨부</th>
<th>작성일</th>
<th>조회수</th>
</tr>

In [10]:
trs[1]

<tr><td class="blank_07" colspan="6"></td></tr>

In [11]:
trs[2]

<tr>
<td style="padding-left:10">자동차</td>
<td><a href="industry_read.naver?nid=38161&amp;page=1">9월 미국 신차 판매</a></td>
<td>유진투자증권</td>
<td class="file"><a href="https://stock.pstatic.net/stock-research/industry/63/20241002_industry_358796000.pdf" target="_blank"><img align="absmiddle" alt="pdf" src="https://ssl.pstatic.net/imgstock/images5/down.gif"/></a></td>
<td class="date" style="padding-left:5px">24.10.02</td>
<td class="date">727</td>
</tr>

In [12]:
trs[3]

<tr>
<td style="padding-left:10">석유화학</td>
<td><a href="industry_read.naver?nid=38160&amp;page=1">OPEC 점유율 경쟁이 초래할 변화는?</a></td>
<td>유안타증권</td>
<td class="file"><a href="https://stock.pstatic.net/stock-research/industry/18/20241002_industry_147225000.pdf" target="_blank"><img align="absmiddle" alt="pdf" src="https://ssl.pstatic.net/imgstock/images5/down.gif"/></a></td>
<td class="date" style="padding-left:5px">24.10.02</td>
<td class="date">566</td>
</tr>

- let's try one

In [13]:
# from trs[2] ... trs[len(trs)-1]
tr = trs[2]
tds = tr.find_all('td')
tds

[<td style="padding-left:10">자동차</td>,
 <td><a href="industry_read.naver?nid=38161&amp;page=1">9월 미국 신차 판매</a></td>,
 <td>유진투자증권</td>,
 <td class="file"><a href="https://stock.pstatic.net/stock-research/industry/63/20241002_industry_358796000.pdf" target="_blank"><img align="absmiddle" alt="pdf" src="https://ssl.pstatic.net/imgstock/images5/down.gif"/></a></td>,
 <td class="date" style="padding-left:5px">24.10.02</td>,
 <td class="date">727</td>]

In [14]:
tds[0]

<td style="padding-left:10">자동차</td>

In [15]:
tds[1]

<td><a href="industry_read.naver?nid=38161&amp;page=1">9월 미국 신차 판매</a></td>

In [16]:
tds[1].a.text

'9월 미국 신차 판매'

In [17]:
tds[1].a['href']

'industry_read.naver?nid=38161&page=1'

In [18]:
url_head = 'https://finance.naver.com/research/'

def get_research(tds):
    company = tds[0].string
    title = tds[1].a.string
    url_query = tds[1].a['href']
    result = {'company': company,
              'title': title,
              'url': url_head + url_query
    }
    return result

## Put them altogether

In [19]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

url_head = 'https://finance.naver.com/research/'

def get_research(tds):
    category = tds[0].string
    title = tds[1].a.string
    researh_url = url_head + tds[1].a['href']
    result = {'category': [category],
              'title': [title],
              'researh_url': [researh_url]
    }
    return result

url = 'https://finance.naver.com/research/industry_list.naver'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

pagenation = soup.find('table', class_='Nnavi')
pages = pagenation.find_all('a')

last_page = 3
reports = pd.DataFrame({"category":[],
                        "title":[],
                        "researh_url":[]})

for k in range(1, last_page+1):
    page_url = 'https://finance.naver.com' + pages[k-1]['href']

    soup = BeautifulSoup(requests.get(page_url).text, 'html.parser')
    table_body = soup.find('table', class_='type_1')

    trs = table_body.find_all('tr')

    for tr in trs[2:]:
        tds = tr.find_all('td')
        if (len(tds) < 2): continue  # skip border lines
        report = get_research(tds)
        reports = pd.concat([reports, pd.DataFrame(report)], ignore_index=True)

reports

Unnamed: 0,category,title,researh_url
0,자동차,9월 미국 신차 판매,https://finance.naver.com/research/industry_re...
1,석유화학,OPEC 점유율 경쟁이 초래할 변화는?,https://finance.naver.com/research/industry_re...
2,자동차,"한국타이어앤테크놀로지, 한온시스템 인수 거..",https://finance.naver.com/research/industry_re...
3,보험,해약환급금준비금 제도 개선: 기대 소멸,https://finance.naver.com/research/industry_re...
4,석유화학,한화 정유/화학 Weekly 낮은 Waha 천연가스 ..,https://finance.naver.com/research/industry_re...
...,...,...,...
85,기타,양자 키우기,https://finance.naver.com/research/industry_re...
86,반도체,역사는 반복될까?,https://finance.naver.com/research/industry_re...
87,유틸리티,과전이하 오비이락,https://finance.naver.com/research/industry_re...
88,건설,미국 기준금리 인하 영향은 미미할 전,https://finance.naver.com/research/industry_re...


# Example 2: (Dynamic Web Content) Web scraping dynamic content only using beautiful soup
- confirmed on July 18, 2024
- How to find the information source?
  0. go to: https://finviz.com/quote.ashx?t=tsla
     - you see "income statements" in the middle of the page (not "cash flow")
     - now you want to extract "cash flow" information
  1. use developer's tool -> network-> XHR
  2. try to click item of information you want to extract (if you click, it will get dynamically loaded)
  3. see how traffic moves whenever you click the item
  4. see "Preview" and "Header" to find the source url (which is in
https://finviz.com/api/statement.ashx?t=tsla&s=CA), which has json format information (헤더에 들어 있는 요청 url)

- Accessing the url using requests.get()
  - When using requests.get() to make HTTP requests in Python, providing headers can be important for several reasons:
    - User-Agent: Many websites use the User-Agent header to identify the client making the request. Some websites may block or limit access to their content based on the User-Agent. By setting a User-Agent header that resembles a common web browser, you can mimic a typical browser request and avoid being blocked or throttled.
    - Authentication: Some websites require authentication to access certain resources or APIs
    - Content Negotiation: The Accept header allows you to specify the preferred content type for the response (e.g., JSON, XML, HTML). By setting the Accept header, you can ensure that the server returns the response in the format you desire.
    - Custom Headers: Certain APIs or web services may require specific custom headers to function correctly.
  - Normally, the default User-Agent should work fine for most APIs and web services.
  - However, if you encounter issues with specific APIs or web services, and they require a valid User-Agent header, you can use a generic User-Agent string like the one used by common web browsers.

In [57]:
# typical User-Agent string that mimics a web browser's User-Agent header
# why? they welcome human visitors, but discourage or restrict automated scraping
# due to server load, data privacy and so on.

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
           'Accept': 'application/json',}

url = 'https://finviz.com/quote.ashx?t=tsla'
response = requests.get(url, headers=headers)
response


<Response [200]>

In [58]:
soup = BeautifulSoup(response.text, 'html.parser')

In [59]:
len(soup.find_all('table'))

16

In [60]:
# there are 16 tables, and now we want to see their class names.

tables = soup.find_all('table')

for index, table in enumerate(tables):
    # Get the class attribute, which returns a list of class names
    class_names = table.get('class', [])
    print(f"Table {index + 1} Class Names: {class_names}")

Table 1 Class Names: ['header']
Table 2 Class Names: ['header-container']
Table 3 Class Names: ['w-full']
Table 4 Class Names: ['navbar']
Table 5 Class Names: ['header-container']
Table 6 Class Names: ['quote-price_wrapper_change', 'text-negative']
Table 7 Class Names: []
Table 8 Class Names: []
Table 9 Class Names: ['fullview-links']
Table 10 Class Names: ['js-snapshot-table', 'snapshot-table2', 'screener_snapshot-table-body']
Table 11 Class Names: []
Table 12 Class Names: ['js-table-ratings', 'styled-table-new', 'is-rounded', 'is-small']
Table 13 Class Names: []
Table 14 Class Names: ['fullview-news-outer', 'news-table']
Table 15 Class Names: ['body-table', 'styled-table-new', 'is-rounded', 'p-0', 'mt-2']
Table 16 Class Names: []


- the information that we want to extract is in the table with the class name 'styled-table-new.is-rounded.quote_statements-table.is-tabular-nums.is-free'.
- but, we do not have it in the soup file, probably it is dynamically loaded.

In [61]:
soup.select('table .styled-table-new.is-rounded.quote_statements-table.is-tabular-nums.is-free')  # not found

[]

- now we get information directly from the data server
  - network 을 통해 데이터의 이동을 살펴 본다. (refresh 또는 click 을 통해)
  - 'balance sheet' 와 'cash flow' 를 번갈아 클릭하면 새롭게 load 되는 것을 볼 수 있다.
  - '미리보기' 에서 확인 후 원하는 데이터가 맞으면 'header' 에서 원하는 '요청 url' 을 찾을 수 있다.

In [62]:
# now we found the url for the data source.
# for 'cash flow'

cashflow = 'https://finviz.com/api/statement.ashx?t=tsla&so=F&s=CA'
cf = requests.get(cashflow, headers=headers)
cf.text

'{"currency":"USD","data":{"Period End Date":["TTM","12/31/2023","12/31/2022","12/31/2021","12/31/2020","12/31/2019","12/31/2018","12/31/2017"],"Period Length":["12 Months","12 Months","12 Months","12 Months","12 Months","12 Months","12 Months","12 Months"],"Net Income":["12,459.00","14,974.00","12,587.00","5,644.00","862.00","-775.00","-1,062.58","-2,240.58"],"Depreciation":["4,991.00","4,667.00","3,747.00","2,911.00","2,322.00","2,154.00","1,901.05","1,636.00"],"Other Funds (Non Cash)":["2,438.00","2,212.00","2,298.00","2,424.00","2,575.00","1,375.00","1,201.38","1,040.52"],"Funds from Operations":["13,672.00","15,504.00","18,632.00","10,979.00","5,759.00","2,754.00","2,039.85","435.95"],"Extraordinary Item":["0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"],"Changes in Working Capital":["-2,140.00","-2,248.00","-3,908.00","518.00","184.00","-349.00","57.95","-496.60"],"Income Taxes Payable":["","","","","","","",""],"Cash from Operating Activities":["11,532.00","13,256.00","1

In [63]:
cfdata = cf.json()   # specific to a response object (JSON string -> dit or list)

In [64]:
type(cfdata)

dict

In [65]:
pd.DataFrame(cfdata).head(3)

Unnamed: 0,currency,data
Period End Date,USD,"[TTM, 12/31/2023, 12/31/2022, 12/31/2021, 12/3..."
Period Length,USD,"[12 Months, 12 Months, 12 Months, 12 Months, 1..."
Net Income,USD,"[12,459.00, 14,974.00, 12,587.00, 5,644.00, 86..."


- or,

In [66]:
import json
import pandas as pd
pd.DataFrame(json.loads(cf.text)).head()

Unnamed: 0,currency,data
Period End Date,USD,"[TTM, 12/31/2023, 12/31/2022, 12/31/2021, 12/3..."
Period Length,USD,"[12 Months, 12 Months, 12 Months, 12 Months, 1..."
Net Income,USD,"[12,459.00, 14,974.00, 12,587.00, 5,644.00, 86..."
Depreciation,USD,"[4,991.00, 4,667.00, 3,747.00, 2,911.00, 2,322..."
Other Funds (Non Cash),USD,"[2,438.00, 2,212.00, 2,298.00, 2,424.00, 2,575..."


In [67]:
# for "balance sheet"

balance_sheet =  'https://finviz.com/api/statement.ashx?t=tsla&so=F&s=BA'
bs = requests.get(balance_sheet, headers=headers)
bsdata = bs.json()
pd.DataFrame(bsdata).head()

Unnamed: 0,currency,data
Period End Date,USD,"[12/31/2023, 12/31/2022, 12/31/2021, 12/31/202..."
Cash & Short Term Investments,USD,"[29,637.00, 22,479.00, 18,052.00, 19,622.00, 6..."
Short Term Receivables,USD,"[3,508.00, 2,952.00, 1,913.00, 1,886.00, 1,324..."
Inventories,USD,"[13,626.00, 12,839.00, 5,757.00, 4,101.00, 3,5..."
Other Current Assets,USD,"[2,845.00, 2,647.00, 1,378.00, 1,108.00, 713.0..."


### Comment: (Direct accesing and Web scraping)
- dynamic load 를 위한 json 소스 사이트를 찾아 json 파일을 direct 로 가져왔음
- 물론 click 을 통해 그 사이트로 이동한 후 Web scraping 도 가능함.
- Json 파일을 직접 엑세스하는 게 유리한 이유
  - simple (easier parsing and direct accessing)
  - effciency: less overhead and faster loading
  - Reliability:
    - Consistent Data Structure: JSON files often have a consistent structure, which makes your data extraction scripts less prone to breaking if the web page layout changes.
    - Less Likely to be Blocked
  - API Usage: Many websites provide APIs for accessing their data. Using these APIs is usually within the terms of service, while scraping might violate them.

- if you want to extract some part of the data

In [68]:
cashflow = 'https://finviz.com/api/statement.ashx?t=tsla&s=CA'
cf = requests.get(cashflow, headers=headers)
soup = BeautifulSoup(cf.content, 'html.parser')

In [69]:
soup

{"currency":"USD","data":{"Period End Date":["TTM","12/31/2023","12/31/2022","12/31/2021","12/31/2020","12/31/2019","12/31/2018","12/31/2017"],"Period Length":["12 Months","12 Months","12 Months","12 Months","12 Months","12 Months","12 Months","12 Months"],"Net Income":["12,459.00","14,974.00","12,587.00","5,644.00","862.00","-775.00","-1,062.58","-2,240.58"],"Depreciation":["4,991.00","4,667.00","3,747.00","2,911.00","2,322.00","2,154.00","1,901.05","1,636.00"],"Other Funds (Non Cash)":["2,438.00","2,212.00","2,298.00","2,424.00","2,575.00","1,375.00","1,201.38","1,040.52"],"Funds from Operations":["13,672.00","15,504.00","18,632.00","10,979.00","5,759.00","2,754.00","2,039.85","435.95"],"Extraordinary Item":["0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"],"Changes in Working Capital":["-2,140.00","-2,248.00","-3,908.00","518.00","184.00","-349.00","57.95","-496.60"],"Income Taxes Payable":["","","","","","","",""],"Cash from Operating Activities":["11,532.00","13,256.00","14

In [70]:
cf.content

b'{"currency":"USD","data":{"Period End Date":["TTM","12/31/2023","12/31/2022","12/31/2021","12/31/2020","12/31/2019","12/31/2018","12/31/2017"],"Period Length":["12 Months","12 Months","12 Months","12 Months","12 Months","12 Months","12 Months","12 Months"],"Net Income":["12,459.00","14,974.00","12,587.00","5,644.00","862.00","-775.00","-1,062.58","-2,240.58"],"Depreciation":["4,991.00","4,667.00","3,747.00","2,911.00","2,322.00","2,154.00","1,901.05","1,636.00"],"Other Funds (Non Cash)":["2,438.00","2,212.00","2,298.00","2,424.00","2,575.00","1,375.00","1,201.38","1,040.52"],"Funds from Operations":["13,672.00","15,504.00","18,632.00","10,979.00","5,759.00","2,754.00","2,039.85","435.95"],"Extraordinary Item":["0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"],"Changes in Working Capital":["-2,140.00","-2,248.00","-3,908.00","518.00","184.00","-349.00","57.95","-496.60"],"Income Taxes Payable":["","","","","","","",""],"Cash from Operating Activities":["11,532.00","13,256.00","

--------------------

# Exercise

In [71]:
df = pd.DataFrame({"a":[1,2,3], "b":[4,5,6]})
df

Unnamed: 0,a,b
0,1,4
1,2,5
2,3,6


In [72]:
dic = {"a":[77], "b":[88]}
pd.DataFrame(dic)

Unnamed: 0,a,b
0,77,88


In [73]:
pd.concat([df, pd.DataFrame(dic)], ignore_index=True)

Unnamed: 0,a,b
0,1,4
1,2,5
2,3,6
3,77,88
