![Banner](https://i.imgur.com/UtzT8nU.jpg)

# Scraping Screens for Popular Investing Themes on SCREENER using Python. 

Screener is one of the best tools to check the fundamentals of a company to invest, the vast information tools it have to compare and analyze the data, very few other website have in India. specially all in one place. we will be scraping the [Screens](https://www.screener.in/screens/) - (`Companies creating new high`) with the help of this information we will be able to find the factors driving these companies close to there 52 week high, even when Indian and Global share markets are volatile and continuously falling everyday. this can help us making a better investment decision. how will we extract the data? for that, lets learn about scraping.

Web scraping is an automatic method to obtain large amounts of data from websites, it helps in extracting all the data on particular sites or the specific data that a user want using Python programming language, the process goes like, first we need to extract the data using [requests](https://pypi.org/project/requests/) library in python then we change it to structured information with the help of [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/).

[More information about web scraping](https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/)

![Banner](https://i.imgur.com/V0uMNWo.jpg)

Following are steps we will follow to get the required data from website:

- Install the libraries `requests` & `BeautifulSoup`
- Downloading the web page using Requests.get function.
- Check the `Status_Code` & parse the HTML data using `BeautifulSoup`.
- Find the related tags & writing custom codes to collect the required information.
- Sort that data using Python Lists & Dictionaries.
- Than using the [Pandas](https://www.w3schools.com/python/pandas/default.asp) & its `DataFrame` function we convert this information to a better readable format & then we change it to a `CSV file`.

We will be working to get a following format of a CSV file.

``` 
Company Name,Market Cap,Current Stock Price,52 Week High,52 Week Low,Stock P\E,Book Value,Dividend Yield ,Stock ROCE,Stock ROE,Company Web Links
Colgate-Palmoliv,"43,004","1,581","1,823","1,376",40.2,61.4,2.53,92.0,75.1,https://www.screener.in/company/COLPAL/
Praveg Comm.,267,144,166,62.2,20.9,9.49,2.77,73.7,63.4,https://www.screener.in/company/531637/
Bedmutha Indus.,282,87.6,101,22.2,1.12,24.3,0.00,71.2,,https://www.screener.in/company/BEDMUTHA/consolidated/...```



#### How to Run the Code ?
You can execute the code using the 'Run' button at the top of this page or pressing `Shit+Enter`.
And if you wish to make some changes & save your own version of notebook to [Jovian](https://jovian.ai), you can do it by executing the following cells. 

In [3]:
!pip install jovian --upgrade --quiet

In [4]:
import jovian

In [5]:
jovian.commit(project="project-web-scraping-with-python")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "naresh/web-scraping-of-screener-for-high-performing-stock-analysis" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/naresh/web-scraping-of-screener-for-high-performing-stock-analysis[0m


'https://jovian.ai/naresh/web-scraping-of-screener-for-high-performing-stock-analysis'

## Install the libraries `requests` & `BeautifulSoup`
We'll install the libraries using `pip`
& import using `import`

In [6]:
!pip install requests --upgrade --quiet
!pip install BeautifulSoup4 --upgrade --quiet

In [7]:
import requests 
from bs4 import BeautifulSoup 

## Downloading the web page using Requests.get function.
To download the page, we'll use `get` function from requests, which returns a `response` model.

In [8]:
url = 'https://www.screener.in/screens/214283/companies-creating-new-high/'
response = requests.get(url)

In [9]:
# we can check the type of response
type(response)

requests.models.Response

### To check if the page has been downloaded successfully.
We need to check the `status_code` of response if it is between (200 & 299) then the page download is successful. More details on [Status Code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status)

In [10]:
response.status_code

200

As the request was successful, lets get the content of page using `response.text`

In [11]:
page_content = response.text

In [12]:
# Checking the lenght of the content brought from website
len(page_content)

41157

The page contains more than 41000 characters, so lets check first 300 characters.

In [13]:
page_content[:300]

'\n<!DOCTYPE html>\n<html lang="en">\n\n<head>\n  <!-- Basic Page Needs -->\n  <meta charset="utf-8">\n  <title>Companies creating new high - Screener</title>\n  <meta name="author" content="Mittal Analytics Private Limited">\n\n  <!-- PWA manifest -->\n  <meta name="theme-color" content="#fff" media="(prefers-'

We have got the html source code of the page, now to view it locally we need save the content in a file, with the below code format we can save it in a file, which can also be downloaded.

In [14]:
with open('url.html', 'w') as f:
    f.write(page_content)

In [15]:
with open('url.html', 'r') as f:
    html_source = f.read()

Now if we open the `url.html` file it should look like below preview.
![](https://i.imgur.com/RhauBHn.png)

Till now we have successfully download the web page and stored the data in an `HTML` file using `request` library and `with open` function to create the file.

## Parse the HTML data using `BeautifulSoup`.

As we already have installed & imported `BeautifulSoup` from `bs4` module, we will directly change the `html_source` file to a beautifulsoup document.

once we have the document, we will be using `.find` function to find the first specified tag & `.find_all` to find all the tags related to provided specifications.

In [16]:
# Giving te parsed data a variable so we can use it easily at multiple places.
doc = BeautifulSoup(html_source)

In [17]:
# Confirming if file is successfully converted to a beautifulsoup object.
type(doc)

bs4.BeautifulSoup

In [18]:
# To check if the bs doc is working fine lets try different tag findings.
doc.title

<title>Companies creating new high - Screener</title>

In [19]:
# With '.text' we can get only the name of tag removing unwanted information.

### Find the related tags & writing custom codes to collect the required information.

In [20]:
company_name_tags = doc.find_all('a', target="_blank")

In [21]:
# Checking how many tags related to company name are there
len(company_name_tags)

25

### Parsing multiple web pages & collecting data in a single file.
So we have a way to get the data from website and change it to beautifulsoup document. as we can see in above cell we are getting only 25 companies detail from one page but we need more so we need to get information from multiple pages. below is the code to scrap & parse more than one page at once and get all info in one file.

In [22]:
def get_web_page(base_url, topic, n):
    topic_url = base_url + topic
    docs = []
    for page in range(1,n):
        if page == 1:
            response = requests.get(topic_url)
            if not response.ok: 
                print('status code', response.status_code)
                raise exception('failed to fetch web page' + topic_url)
        else:
            response = requests.get(topic_url + '?page=' + str(page))
        doc = BeautifulSoup(response.text)
        docs.append(doc)
    return docs

As in defined function we need to give some inputs for function to work and extract the information. what actually `get_web_page` function do is take the base url of a website add it to the said topic link of site than scrap each page one by one using [for loop](https://www.w3schools.com/python/python_for_loops.asp) as per the number of pages given and add all that information to one file with the help of [list containers](https://www.w3schools.com/python/python_lists.asp)

In [23]:
base_url = 'https://www.screener.in/screens/'
topic = '214283/companies-creating-new-high/'
docs = get_web_page(base_url, topic, 6)
len(docs)

5

Once we have the information we will copy it to an html file for local downloading and than change it to a beautifulsoup document to extract the required information.

In [24]:
with open('docs.html', 'w') as f:
    f.write(str(docs))

In [25]:
with open ('docs.html', 'r') as f:
    html_source = f.read()

In [26]:
main_doc = BeautifulSoup(html_source)

In [27]:
type(main_doc)

bs4.BeautifulSoup

Now we need to find the List of company names and urls, using below function but before that we need to identify the related tags to name and url, which we can do by looking at the `main doc` or directly can right click on required element and inspect the website.
![tag](https://i.imgur.com/O5dJrdd.jpg)

now let us explain what the above screen shot is representing, when we right clicked on company name and checked to inspect,the right part of screen pops up and there we can see there is a tag `target='_blank'`before the company name text, we checked the same for multiple name the same tag is present there, so we can get the names using this tag.

In [28]:
name_tag = main_doc.find_all(target="_blank")
len(name_tag)

125

In [29]:
# checking if we getting the right information
name_tag[0].text.strip()

'Life Insurance'

### Defining the function to get the list of company names from extracted main_doc.

In [30]:
def company_name_list(tags):
    name_list = []
    for i in range(len(tags)):
        name_list.append(tags[i].text.strip())
    return name_list

Now need to get the list of company names.

In [31]:
name_list = company_name_list(name_tag)
len(name_list)

125

### Lets define a function to extract the list of company urls.

In [32]:
def company_url_list(tags):
    url_list = []
    main_url = "https://www.screener.in"
    for i in range(len(tags)):
        url_list.append(main_url + tags[i]['href'])
    return url_list

Now need to get the list of company urls & confirming if lenght is equals to list of name.

In [33]:
url_list = company_url_list(name_tag)
len(url_list)

125

In [34]:
# Checking the list how it looks like.
url_list[:5]

['https://www.screener.in/company/LICI/',
 'https://www.screener.in/company/COLPAL/',
 'https://www.screener.in/company/531637/',
 'https://www.screener.in/company/EASEMYTRIP/',
 'https://www.screener.in/company/542285/']

### We have list of company names & urls, now we need to define a function to get the information of individual company:
below is the list, we need to scrap from webpage for each firm. 

- Market Cap
- Current Stock Price
- 52 Week High
- 52 Week Low
- Stock P\E
- Book Value
- Dividend Yield(%)
- Stock ROCE(%)
- Stock ROE(%)

![](https://i.imgur.com/3hMzhb1.png)



## Below we will define a function to get the `BeautifulSoup` object of individual company.

This is to extract the above mentioned information

In [35]:
def com_doc(url):
        response = requests.get(url)
        if not response.ok: 
                print('status code', response.status_code)
                raise exception('failed to fetch web page' + url)
        doc = BeautifulSoup(response.text)
        return doc

### Need to create a dictionary with empty lists, so we can fill these lists when we have all the information.

In [36]:
company_info = {'Company Name': name_list,
               'Market Cap': [],
               'Current Stock Price': [],
               '52 Week High': [],
               '52 Week Low': [],
               'Stock P\E': [],
               'Book Value': [],
               'Dividend Yield(%)': [],
               'Stock ROCE(%)': [],
               'Stock ROE(%)': [],
               'Company Screen Links': url_list}

### Now to collect the individual companies information in empty lists of dictionary, we will use the below function.
#### [More Information on how to fill empty lists in dictionary](https://www.geeksforgeeks.org/appending-to-list-in-python-dictionary/) 

In [37]:
def company_data(url):
        company_dict = {'Company Name': name_list,
               'Market Cap': [],
               'Current Stock Price': [],
               '52 Week High': [],
               '52 Week Low': [],
               'Stock P\E': [],
               'Book Value': [],
               'Dividend Yield(%)': [],
               'Stock ROCE(%)': [],
               'Stock ROE(%)': [],
               'Company Screen Links': url_list}
        for i in range(len(url)):
            doc = com_doc(url[i])
            tag = doc.find_all('span', class_ = "number")     
            company_dict['Market Cap'].append(tag[0].text)
            company_dict['Current Stock Price'].append(tag[1].text)
            company_dict['52 Week High'].append(tag[2].text)
            company_dict['52 Week Low'].append(tag[3].text)
            company_dict['Stock P\E'].append(tag[4].text)
            company_dict['Book Value'].append(tag[5].text)
            company_dict['Dividend Yield(%)'].append(tag[6].text)
            company_dict['Stock ROCE(%)'].append(tag[7].text)
            company_dict['Stock ROE(%)'].append(tag[8].text)
        return company_dict

In [38]:
company_data = company_data(url_list)
len(company_data)

11

### So we have 11 columns & 125 rows of data now lets change this [Dictionary](https://www.w3schools.com/python/python_dictionaries.asp) of lists to a better readable & sorted format with the help of [Pandas](https://www.w3schools.com/python/pandas/default.asp)

In [39]:
import pandas as pd

In [40]:
complete_data = pd.DataFrame(company_data)

In [41]:
# Checking if File is successfully converted to a 'pd.dataframe' file
type(complete_data)

pandas.core.frame.DataFrame

In [42]:
complete_data

Unnamed: 0,Company Name,Market Cap,Current Stock Price,52 Week High,52 Week Low,Stock P\E,Book Value,Dividend Yield(%),Stock ROCE(%),Stock ROE(%),Company Screen Links
0,Life Insurance,554101,876,920,860,191,10.1,0.00,310,81.7,https://www.screener.in/company/LICI/
1,Colgate-Palmoliv,43661,1605,1823,1376,40.8,61.4,2.49,92.0,75.1,https://www.screener.in/company/COLPAL/
2,Praveg Comm.,279,151,166,62.2,21.9,9.49,2.65,73.7,63.4,https://www.screener.in/company/531637/
3,Easy Trip Plann.,8982,413,444,96.1,78.5,8.58,0.24,65.5,46.5,https://www.screener.in/company/EASEMYTRIP/
4,Axita Cotton,334,170,171,18.2,23.0,11.1,0.00,59.0,66.6,https://www.screener.in/company/542285/
...,...,...,...,...,...,...,...,...,...,...,...
120,Sanmit Infra,662,419,435,84.0,95.0,15.3,0.08,14.8,13.2,https://www.screener.in/company/532435/
121,Insecticid.India,1511,740,847,511,14.2,410,0.27,14.8,13.0,https://www.screener.in/company/INSECTICID/
122,Ganesh Housing,2291,274,314,57.1,32.5,60.2,0.00,14.7,14.1,https://www.screener.in/company/GANESHHOUC/con...
123,DU Digital,114,438,450,57.0,1265,5.96,0.00,14.5,6.02,https://www.screener.in/company/DUGLOBAL/conso...


## Changing the data to a CSV file.

### Using [to_csv](https://stackoverflow.com/questions/16923281/writing-a-pandas-dataframe-to-csv-file), from Pandas dataframe format.

In [43]:
# we have set index to None because the first column of numbering is unnecessary in csv format.
complete_data.to_csv('screen.csv', index=None)

In [44]:
# checking the file first few rows to confirm if obtained information is matching with website.
!head screen.csv

Company Name,Market Cap,Current Stock Price,52 Week High,52 Week Low,Stock P\E,Book Value,Dividend Yield(%),Stock ROCE(%),Stock ROE(%),Company Screen Links
Life Insurance,"554,101",876,920,860,191,10.1,0.00,310,81.7,https://www.screener.in/company/LICI/
Colgate-Palmoliv,"43,661","1,605","1,823","1,376",40.8,61.4,2.49,92.0,75.1,https://www.screener.in/company/COLPAL/
Praveg Comm.,279,151,166,62.2,21.9,9.49,2.65,73.7,63.4,https://www.screener.in/company/531637/
Easy Trip Plann.,"8,982",413,444,96.1,78.5,8.58,0.24,65.5,46.5,https://www.screener.in/company/EASEMYTRIP/
Axita Cotton,334,170,171,18.2,23.0,11.1,0.00,59.0,66.6,https://www.screener.in/company/542285/
TCS,"1,261,787","3,449","4,046","3,052",32.9,244,1.10,54.9,43.6,https://www.screener.in/company/TCS/consolidated/
La Tim Metal & I,172,194,218,68.4,9.56,33.8,0.26,53.9,190,https://www.screener.in/company/505693/consolidated/
Anand Rathi Wea.,"2,655",638,721,542,21.0,82.6,0.78,51.6,43.3,https://www.screener.in/company/ANANDRA

In [None]:
jovian.commit(files=['screen.csv'])

<IPython.core.display.Javascript object>

## Summary
Here is the recap of what we have covered till now.

- Install the libraries `requests` & `BeautifulSoup`
- Downloading the web page using Requests.get function.
- Check the `Status_Code` & parse the HTML data using `BeautifulSoup`.
- Find the related tags & writing custom codes to collect the required information.
- Sort that data using Python Lists & Dictionaries.
- Than using the [Pandas](https://www.w3schools.com/python/pandas/default.asp) & its `DataFrame` function we convert this information to a better readable format & then we change it to a `CSV file`.

And the final CSV file have the following format:

``` 
Company Name,MCap,Current Stock Price,52 Week High,52 Week Low,Stock P\E,Book Value,Dividend Yield ,Stock ROCE,Stock ROE,Company Web Links
Colgate-Palmoliv,"43,004","1,581","1,823","1,376",40.2,61.4,2.53,92.0,75.1,https://www.screener.in/company/COLPAL/
Praveg Comm.,267,144,166,62.2,20.9,9.49,2.77,73.7,63.4,https://www.screener.in/company/531637/
Bedmutha Indus.,282,87.6,101,22.2,1.12,24.3,0.00,71.2,,https://www.screener.in/company/BEDMUTHA/consolidated/...```


### Following is the list of functions we defined during the completion of this project:

In [50]:
def get_web_page(base_url, topic, n):
    topic_url = base_url + topic
    docs = []
    for page in range(1,n):
        if page == 1:
            response = requests.get(topic_url)
            if not response.ok: 
                print('status code', response.status_code)
                raise exception('failed to fetch web page' + topic_url)
        else:
            response = requests.get(topic_url + '?page=' + str(page))
        doc = BeautifulSoup(response.text)
        docs.append(doc)
    return docs

def company_name_list(tags):
    name_list = []
    for i in range(len(tags)):
        name_list.append(tags[i].text.strip())
    return name_list

def company_url_list(tags):
    url_list = []
    main_url = "https://www.screener.in"
    for i in range(len(tags)):
        url_list.append(main_url + tags[i]['href'])
    return url_list
    
def com_doc(url):
        response = requests.get(url)
        if not response.ok: 
                print('status code', response.status_code)
                raise exception('failed to fetch web page' + url)
        doc = BeautifulSoup(response.text)
        return doc

def company_data(url):
        company_dict = {'Company Name': name_list,
               'Market Cap': [],
               'Current Stock Price': [],
               '52 Week High': [],
               '52 Week Low': [],
               'Stock P\E': [],
               'Book Value': [],
               'Dividend Yield(%)': [],
               'Stock ROCE(%)': [],
               'Stock ROE(%)': [],
               'Company Screen Links': url_list}
        for i in range(len(url)):
            doc = com_doc(url[i])
            tag = doc.find_all('span', class_ = "number")     
            company_dict['Market Cap'].append(tag[0].text)
            company_dict['Current Stock Price'].append(tag[1].text)
            company_dict['52 Week High'].append(tag[2].text)
            company_dict['52 Week Low'].append(tag[3].text)
            company_dict['Stock P\E'].append(tag[4].text)
            company_dict['Book Value'].append(tag[5].text)
            company_dict['Dividend Yield(%)'].append(tag[6].text)
            company_dict['Stock ROCE(%)'].append(tag[7].text)
            company_dict['Stock ROE(%)'].append(tag[8].text)
        return company_dict
    


In [None]:
# Last 2 codes of last cell can also be combined into one in case of any specufic requirement.

def company_data(url):
    mcap_list = []
    cprice = []
    weekhigh = []
    weeklow = []
    stockpe = []
    bv = []
    dividend = []
    roce = []
    roe = []
    for i in range(len(url)):
        response = requests.get(url[i])
        if not response.ok: 
                print('status code', response.status_code)
                raise exception('failed to fetch web page' + url[i])
        doc = BeautifulSoup(response.text)
        tag = doc.find_all('span', class_ = "number")
        mcap_list.append(tag[0].text)
        cprice.append(tag[1].text)
        weekhigh.append(tag[2].text)
        weeklow.append(tag[3].text)
        stockpe.append(tag[4].text)
        bv.append(tag[5].text)
        dividend.append(tag[6].text)
        roce.append(tag[7].text)
        roe.append(tag[8].text)
    company_info = { 'Company Name': name_list,
           'Market Cap': mcap_list,
           'Current Stock Price': cprice,
           '52 Week High': weekhigh,
           '52 Week Low': weeklow,
           'Stock P\E': stockpe,
           'Book Value': bv,
           'Dividend Yield(%) ': dividend,
           'Stock ROCE(%)': roce,
           'Stock ROE(%)': roe,
           'Company Screen Links': url_list,}
    return company_info

### Future Work

- We can also extract the data from other screens from screener to get more insight on list of selected stocks.
- To get the results & detailed information on individual stock, we can scrap the company pages individually.
- Screener is good for fundamental analysis but to get the technical overview, we can scrape 'TradingView.com' for technical indicators.


# Reference 

- https://www.screener.in/screens/
- https://www.screener.in/screens/214283/companies-creating-new-high/
- https://pypi.org/project/requests/
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
- https://beautiful-soup-4.readthedocs.io/en/latest/
- https://www.w3schools.com/python/python_for_loops.asp
- https://www.w3schools.com/python/python_lists.asp
- https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/
- https://www.w3schools.com/python/pandas/default.asp
- https://www.geeksforgeeks.org/appending-to-list-in-python-dictionary/


In [2]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "naresh/web-scraping-of-screener-for-high-performing-stock-analysis" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/naresh/web-scraping-of-screener-for-high-performing-stock-analysis[0m


'https://jovian.ai/naresh/web-scraping-of-screener-for-high-performing-stock-analysis'