Skip to content

shahriar-rahman/Scraping-Unlisted-Zone-Shares

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unlisted Zone Shares

Parse and extract financial information for stock analysis of companies based on the Unlisted Zone Shares.


alt text
'Studying the Business Analytics' by Tima Miroshnichenko | Pexels Licensed



◘ Navigation



◘ Introduction

• Background

Unlisted Shares is one of the leading startup to facilitate in India where transactions of purchase and selling a myriad of Unlisted, ESOP, or Pre-IPO Shares. This study aims to utilize crawling tools, by utilizing Scrapy, to acquire the data that can be found on the aforementioned website and store it inside a database table, using Sqlite3, as well as as other memory-based formats using the Pandas dataframe.


• Objectives

  • Crawl the target site.
  • Parse the page.
  • Scrape relevant data.
  • Store inside a database and memory.



◘ Technical Preliminaries

• Database Specification

Attributes Data Type Description
Company text Indicates the Name of the company.
Lot Size text Amount of shares for a lot.
Last Price text Price of the trade made the last time.
Cost per Lot text Price for a single Lot share.

• Repository Organization


│
├── LICENSE                        
│ 
├── README.md	#The top-level README for developers using this project.  
│ 
├── database    #SQLite3 database files                                                  
|
│   ├── db_unlisted.db    	                        
│
│   └── db_unlisted.db.sql                
│ 
├── datasets    #Memory Storage using differetn formats                                                 
|
│   ├── unlisted_df.xlsx    	                        
│
│   ├── unlisted_df.csv                            
│
│   └── unlisted_df.json                            
│
├── files	#Contains various files related to this repository.   
│
├── requirements.txt	#Allows reproducing the analysis environment, generated using `pip freeze > requirements.txt` command.                        
│                         			
├── unlisted_zone	#Source codes for this project.                        
|
│    ├── __init__.py	                        
│   
│    ├── scrapy.cfg    #Configuration settings for Scrapy.                        
|
│    ├── sqlite3    #Utilities folder.                            
|
|            └── __init__.py	                        
|
|            └── sqlite_table.py	  
|
│    ├── py_utils    #Utilities folder.                            
|
|            └── __init__.py	                        
|
|            └── common_utils.py	                        
|
│            └── scraping_utils.py                                
|
│            └── sqlite_utils.py                  
│   
│    ├── unlisted_zone    #Scrapy project folder.                            
|
|            └── spiders	                        
|
|                    └── __init__.py                        
|
|                    └── crawler_bot.py                        
|
│            └── items.py                        
│   
│            └── middlewares.py                        
│   
│            └── pipelines.py                        
│   
│            └── settings.py                        
│
└── 


• Scraping Flow

Process flow of the scraping process. alt text



◘ Requisites

• Technologies

  1. Python 3.11
  2. PyCharm IDE (2023.1)

• Packages

  1. Scrapy==2.10.0
  2. scrapy-proxy-pool==0.1.9
  3. scrapy-user-agents==0.1.1
  4. Scrapy3==1.0.1
  5. db-sqlite3==0.0.1
  6. pandas==2.0.0

• Technical Utilization

  1. XPATH Selectors
  2. User Agents
  3. Automated Pagination
  4. Temporary Containers
  5. SQLite for Database storage
  6. Storage in Excel, JSON and CSV format

◘ Installation

• Project Installation

  1. To begin using project, create a virtual environment:
python -m venv /path/to/new/virtual/environment                                                   
  1. Activate the Virtual Environment (command can vary on Windows or Linux based systems).
source </path/to/new/virtual/environment>/bin/activate                                            
  • For instance:
python -m venv .scrape                                                                   
source .scrape/bin/activate                                                                   
  1. Install the dependencies: pip install -r requirements.txt


• Getting Started with Scrapy

  • To create a new project, run the following command in the terminal:

    scrapy startproject my_first_spider                                                              
  • To create a new spider, first change the directory:

    cd my_first_spider                                                                   
  • Creating a spider:

    scrapy genspider example example.com                                                         

Upon creating a spider, a basic Template is generated. The class is built with the data we introduced in the previous command, but the parse method needs to be built by the user.

  • Run the spider and export data to CSV or JSON
scrapy crawl example                                                           
scrapy crawl example -o name_of_file.csv                                                           
scrapy crawl example -o name_of_file.json                                                           

• Scrapy Directory

Directory Diagram of Scrapy alt text



◘ Resources

For more details, visit the following links:



◘ License

• MIT License

Copyright (c) 2023 Shahriar Rahman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


About

Scraping and storing financial data using Scrapy and Sqlite3.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages