## Web Scraping Nasdaq News Using Python

### Introduction

The Nasdaq (National Association of Securities Dealers Automated Quotations) Stock Market is an electronic stock exchange in the United States. It is the second largest stock exchange in the world by market capitalization.

The Nasdaq Stock Market website provides stock market news, business news, and financial news useful for data analysis on stocks and trading.

Web scraping or extract information from Nasdaq news website allows us to perform stock price prediction, stock market sentiment analysis, and equity research, etc.

### Project goal

To scrap most active Nasdaq stocks from Nasdaq news website

### Data

Source: https://www.nasdaq.com/news/

In [3]:
#import libraries
import urllib.request

#import the Beautiful soup function to parse the data returned from the website
from bs4 import BeautifulSoup

In [20]:
# Specify the url
nasdaq = "https://www.nasdaq.com/markets/most-active.aspx"

In [21]:
# Open the connection and download html page from url
webpage = urllib.request.urlopen(nasdaq)

In [22]:
#Parse the html data in the 'webpage' variable and save as Beautiful Soup format
soup = BeautifulSoup(webpage)

In [23]:
# Check nested structure of the html page
print(soup.prettify())

<!DOCTYPE html>
<html class="nasdaqCom inner no-js" lang="en-us" xmlns:fb="https://www.facebook.com/2008/fbml" xmlns:og="https://ogp.me/ns#">
 <head>
  <!-- Google Tag Manager -->
  <script>
   (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-K2BQVP7');
  </script>
  <!-- End Google Tag Manager -->
  <script>
   (function(){
    var is_chrome;
    if(typeof navigator.vendor!="undefined")
	    is_chrome = ((navigator.userAgent.toLowerCase().indexOf('chrome') > -1) && (navigator.vendor.toLowerCase().indexOf("google") > -1));
    else
	    is_chrome = false;
})();
  </script>
  <!-- includes\2-column.master -->
  <meta charset="utf-8"/>
  <meta content="IE=Edge;chrome=1" http-equiv="X-UA-Compatible"/>


In [24]:
soup.title

<title>Most Active Stocks - Most Active Share Volume</title>

In [25]:
soup.title.string

'Most Active Stocks - Most Active Share Volume'

In [48]:
table=soup.find_all('div', attrs='genTable')
all_rows=table[1].find_all('tr')

In [91]:
# Generate lists
symbol=[]

name=[]

last_sale=[]

change_net=[]

share_volume=[]

 
for row in all_rows:

    cols=row.find_all('td')

    if(len(cols)):
        symbol.append(cols[0].text)
        
        name.append(cols[1].text)

        last_sale.append(cols[3].text)

        change_net.append(cols[4].text)

        share_volume.append(cols[5].text)

In [86]:
symbol

['\nENDP\n\n',
 '\nMREO\n\n',
 '\nBCEL\n\n',
 '\nVVUS\n\n',
 '\nOCC\n\n',
 '\nJFIN\n\n',
 '\nVNET\n\n',
 '\nJFU\n\n',
 '\nMNLO\n\n',
 '\nPHAS\n\n',
 '\nACOR\n\n',
 '\nCIH\n\n',
 '\nEVER\n\n',
 '\nEVLO\n\n',
 '\nTBIO\n\n',
 '\nNIU\n\n',
 '\nLOVE\n\n',
 '\nGMDA\n\n',
 '\nNICK\n\n',
 '\nCHMA\n\n']

In [99]:
symbol = [x.replace('\n', '') for x in symbol]
symbol

['ENDP',
 'MREO',
 'BCEL',
 'VVUS',
 'OCC',
 'JFIN',
 'VNET',
 'JFU',
 'MNLO',
 'PHAS',
 'ACOR',
 'CIH',
 'EVER',
 'EVLO',
 'TBIO',
 'NIU',
 'LOVE',
 'GMDA',
 'NICK',
 'CHMA']

In [50]:
name

['Endo International plc',
 'Mereo BioPharma Group plc',
 'Atreca, Inc.',
 'VIVUS, Inc.',
 'Optical Cable Corporation',
 'Jiayin Group Inc.',
 '21Vianet Group, Inc.',
 '9F Inc.',
 'Menlo Therapeutics Inc.',
 'PhaseBio Pharmaceuticals, Inc.',
 'Acorda Therapeutics, Inc.',
 'China Index Holdings Limited',
 'EverQuote, Inc.',
 'Evelo Biosciences, Inc.',
 'Translate Bio, Inc.',
 'Niu Technologies',
 'The Lovesac Company',
 'Gamida Cell Ltd.',
 'Nicholas Financial, Inc.',
 'Chiasma, Inc.']

In [45]:
last_sale

['$ 3.085 ',
 '$ 4.1521 ',
 '$ 20.48 ',
 '$ 3.90 ',
 '$ 3.9325 ',
 '$ 12.70 ',
 '$ 7.80 ',
 '$ 10.63 ',
 '$ 4.14 ',
 '$ 8.15 ',
 '$ 3.1848 ',
 '$ 2.38 ',
 '$ 23.90 ',
 '$ 6.88 ',
 '$ 8.69 ',
 '$ 8.22 ',
 '$ 18.9675 ',
 '$ 3.1932 ',
 '$ 9.30 ',
 '$ 5.39 ']

In [46]:
change_net

['0.535\xa0▲\xa020.98% ',
 '0.6021\xa0▲\xa016.96% ',
 '2.88\xa0▲\xa016.36% ',
 '0.50\xa0▲\xa014.71% ',
 '0.4225\xa0▲\xa012.04% ',
 '1.29\xa0▲\xa011.31% ',
 '0.78\xa0▲\xa011.11% ',
 '0.91\xa0▲\xa09.36% ',
 '0.35\xa0▲\xa09.23% ',
 '0.67\xa0▲\xa08.96% ',
 '0.2548\xa0▲\xa08.70% ',
 '0.1799\xa0▲\xa08.18% ',
 '1.78\xa0▲\xa08.05% ',
 '0.49\xa0▲\xa07.67% ',
 '0.55\xa0▲\xa06.76% ',
 '0.52\xa0▲\xa06.75% ',
 '1.1875\xa0▲\xa06.68% ',
 '0.1932\xa0▲\xa06.44% ',
 '0.55\xa0▲\xa06.29% ',
 '0.31\xa0▲\xa06.10% ']

In [44]:
share_volume

['15,305,100 ',
 '69,523 ',
 '105,907 ',
 '2,729,029 ',
 '5,551 ',
 '13,584 ',
 '443,998 ',
 '196,175 ',
 '40,526 ',
 '56,029 ',
 '1,264,480 ',
 '37,901 ',
 '523,374 ',
 '23,272 ',
 '102,099 ',
 '217,151 ',
 '130,352 ',
 '21,665 ',
 '16,874 ',
 '189,200 ']

In [104]:
import pandas as pd

# dictionary of lists  
dict = {'Symbol': symbol, 'Name': name, 'Last_sale': last_sale, 'Change_net': change_net, 
        'Share_volume': share_volume}  

# Save as dataframe
df = pd.DataFrame(dict) 
    
df 

Unnamed: 0,Symbol,Name,Last_sale,Change_net,Share_volume
0,ENDP,Endo International plc,$ 3.085,0.535 ▲ 20.98%,15305100
1,MREO,Mereo BioPharma Group plc,$ 4.1521,0.6021 ▲ 16.96%,69523
2,BCEL,"Atreca, Inc.",$ 20.48,2.88 ▲ 16.36%,105907
3,VVUS,"VIVUS, Inc.",$ 3.90,0.50 ▲ 14.71%,2729029
4,OCC,Optical Cable Corporation,$ 3.9325,0.4225 ▲ 12.04%,5551
5,JFIN,Jiayin Group Inc.,$ 12.70,1.29 ▲ 11.31%,13584
6,VNET,"21Vianet Group, Inc.",$ 7.80,0.78 ▲ 11.11%,443998
7,JFU,9F Inc.,$ 10.63,0.91 ▲ 9.36%,196175
8,MNLO,Menlo Therapeutics Inc.,$ 4.14,0.35 ▲ 9.23%,40526
9,PHAS,"PhaseBio Pharmaceuticals, Inc.",$ 8.15,0.67 ▲ 8.96%,56029
