GitHub

Market Scanner

About 🛈

An SPA that compiles vape products from several websites into one table that can be navigated, searched, and filtered. The main motivation was to get all available products from several websites in one clean table, as some vape websites are usually messy and have clunky filter/search functionalities.

Users can click on a product and it will redirect them to the item’s page on its corresponding website.

Only two websites have been scraped, but can be expanded upon using Scrapy (See below)

Built with 🔧

Flask
Bootstrap
SQLAlchemy
Scrapy

Features 📋

Responsive display
Pagination
o Data of adjacent pages are fetched/deleted in the background asynchronically based on the current page, providing smooth and fast transition between pages
Data filtering
o Filter based on website, name, product attributes, and other categories
o Combine filters (concurrently or consecutively)
o Filter options are based on the data fetched (i.e. filtered data will have a filter that contains exclusively their attributes for further filtering)

⛔ Declaimer ⛔

Crawling of the items from the original source website may not always yield perfect results due to the poor HTML structure of some sites. As a result, some data on this website may appear as "No Data" in the table. Some items may be out of stock. Please note that we do not own or claim any rights to the data displayed here; all data belong to their respective websites.

🚭 This website does not endorse or promote smoking or vaping in any way. We encourage our visitors to make informed and responsible choices regarding their health and well-being.

Usage 🧮

1- If you want to use Visual Studio, just run the .sln file, build the environment, and run the server.
2- Alternatively, you can just create your own virtualenv in the command line and run flask.

Scrapy 🕷

You can use this demo to get the hang of Scrapy. To use the scrapping files in this project:

1- Go to one of the folders that end with “_scraper “ (e.g. vapegateae_scraper)
2- Modify the spider to match the website you are targeting.
3- Maintain the format that is returned by the parseItem function (in the spider.py)
4- Run the spider

scrapy crawl spider -O websitedata.json

5- Run the postScraperJsonCleaner on the outputted json. This will:
a. Clean the data (this will vary between websites as they each have their own mess and inconsistencies)
b. Place the cleaned data into a json compatible with the function that will populate the DB
6- Uncomment code corresponding in the populateDB function in app.py
7- Run the function by running the flask app and calling on localhost/filldb
8- Data now should be in the DB, refreshing the homepage will show the new data (along with the old ones)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
instance		instance
static		static
templates		templates
vape-shop-dubai_scraper		vape-shop-dubai_scraper
vapegateae_scraper		vapegateae_scraper
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MarketScanner.pyproj		MarketScanner.pyproj
MarketScanner.sln		MarketScanner.sln
README.md		README.md
app.py		app.py
demo.gif		demo.gif
models.py		models.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Market Scanner

About 🛈

Built with 🔧

Features 📋

⛔ Declaimer ⛔

Usage 🧮

Scrapy 🕷

Demo ⏵

About

Releases

Packages

Languages

License

moustafa2121/MarketScanner

Folders and files

Latest commit

History

Repository files navigation

Market Scanner

About 🛈

Built with 🔧

Features 📋

⛔ Declaimer ⛔

Usage 🧮

Scrapy 🕷

Demo ⏵

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages