Skip to content

Commit

Permalink
Merge pull request #32 from SanjanaBankar/Updated-README
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
sanjay-kv authored May 12, 2024
2 parents afc4899 + 7a1735b commit 86b8bd4
Showing 1 changed file with 34 additions and 29 deletions.
63 changes: 34 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,58 @@
<img src="https://raw.githubusercontent.com/sanjay-kv/Semi-supervised-sequence-learning-Project/main/imdb_review_scrapping/Header_images/Imdb_scrapping.png" align="center"/>

<h1 align="center">IMDB Movie review Scraping</h1>
<blockquote align="center">Scraping the movie review ✏️ using python programming language💻. </blockquote>
<h2 align="center">🎬IMDB Movie review Scrapping📊</h2>
<blockquote align="center">Scrapping the movie review ✏️ using python programming language💻. </blockquote>

<p align="center">For new data generation <b>Semi-supervised-sequence-learning-Project</b> we have written a python script to fetch📊, data from the 💻, imdb website and converted into txt files. </p>
<p align="center">This project aims to replicate the Semi-supervised-sequence-learning-Project on a new dataset generated through scraping IMDb movie reviews. The generated data will be utilized for further analysis and exploration.
</p>
🔍Welcome to the IMDb Movie Review Scraper project! 🌟 This Python script is designed to scrape movie reviews from IMDb, providing valuable data for analysis and research purposes. The IMDb Movie Review Scraping project aims to gather a new dataset by automatically extracting movie reviews from IMDb. This dataset will support various natural language processing tasks, including sentiment analysis and recommendation systems. Using web scraping techniques, such as Beautiful Soup, movie reviews are collected, preprocessed, and structured into a CSV format suitable for analysis, including Support Vector Machine classification. 📈
## Features

**`Semi-supervised-sequence-learning-Project`** : replication process is done over here and for further analysis creation of new data is required.

1. Scraping Movie Reviews 🕵️‍♂️
- `Movie_review_imdb_scrapping.ipynb` - The script fetches user reviews from IMDb, providing access to a diverse range of opinions and feedback for different movies. It utilizes BeautifulSoup, a powerful Python library for web scraping, to extract data from IMDb's web pages efficiently and accurately. 🎥🔎

2. Customizable Scraper 🛠️
- `rename_files.ipynb` - Users can customize the scraper to target specific time periods, ratings, and other parameters, enabling focused data collection based on their requirements. This flexibility allows researchers, analysts, and enthusiasts to tailor the scraping process to their specific needs. 🎯🔧

# Introduction
3. CSV Output 📁
- `convert_texts_to_csv.ipynb` - The scraped data is saved into a CSV file, allowing for easy import into data analysis software or further processing. The CSV format ensures compatibility with a wide range of tools and platforms, making it convenient to incorporate the scraped data into various workflows and projects. 💾💼

**`Semi-supervised-sequence-learning-Project`** :computer: The IMDb Movie Review Scraping project aims to gather a new dataset by automatically extracting movie reviews from IMDb. This dataset will support various natural language processing tasks, including sentiment analysis and recommendation systems. Using web scraping techniques, such as Beautiful Soup, movie reviews are collected, preprocessed, and structured into a CSV format suitable for analysis, including Support Vector Machine classification.

- The following script includes the following.

- `Movie_review_imdb_scrapping.ipynb` - Script to scrape the data from imdb website
- `rename_files.ipynb` - Script to rename the scrapped text files as per the requirements
- `convert_texts_to_csv.ipynb` - Python script to make a CSV file from the txt files for SVM processing
## Getting Started

- `Movie_review_imdb_scrapping.ipynb` - Script to scrape the data from IMDb website
- `rename_files.ipynb` - Script to rename the scraped text files as per the requirements
- `convert_texts_to_csv.ipynb` - Python script for converting the scraped text files into a CSV format suitable for SVM processing
**Dependencies**

Make sure you have the following dependencies installed:

* Python 3.x
* BeautifulSoup (Install using ```pip install beautifulsoup4```
* Pandas (Install using ```pip install pandas```

**Installation**

## Dependencies

Ensure Beautifulsoup is installed using `pip install beautifulsoup4`

## Installation

**1️⃣ Fork the `Semi-supervised-sequence-learning-Project/` repository**
Follow these instructions on [how to fork a repository](https://help.github.com/en/articles/fork-a-repo)

**2️⃣ Cloning the repository**
Once you have set up your fork of the `/Semi-supervised-sequence-learning-Project` repository, you'll want to clone it to your local machine. This is so you can make and test all of your personal edits before adding it to the master version of `/Semi-supervised-sequence-learning-Project`.

Navigate to the location on your computer where you want to host your code. Once in the appropriate folder, run the following command to clone the repository to your local machine.
1. **1Fork the `Semi-supervised-sequence-learning-Project/` repository**
Follow these instructions on [how to fork a repository](https://help.github.com/en/articles/fork-a-repo)

2. Clone the repository to your local machine.
```
git clone git@github.com:your-username/sanjay-kv/Semi-supervised-sequence-learning-Project.git
```

## Final Dataset
## Usage

- Customize the scraper settings in the scraper.py file as per your requirements. This includes specifying the time period, ratings, and any other parameters you want to filter by.

1️⃣ Here is the Link to **Final Dataset:** [Drive Link](https://drive.google.com/file/d/1sTNAeuy-99Hao0V5AOVznLXyDJC2zuFn/view?usp=sharing)
- Run the scraper.py script:

`python scraper.py`

- The scraped data will be saved into a CSV file named data.csv in the data_scrapped directory.
## Contribution
🎉Contributions are welcome! If you have any suggestions for improvements or new features, please feel free to submit a pull request. Your contributions help make this project better for everyone. 🚀
## Final Dataset

🔬Here is the Link to **Final Dataset:** [Drive Link](https://drive.google.com/file/d/1sTNAeuy-99Hao0V5AOVznLXyDJC2zuFn/view?usp=sharing) containing the scraped IMDb movie reviews. This dataset can be used for analysis, research, or any other purposes you require. 📦
## Support

🤝For any issues regarding the scraper, feel free to open an issue on GitHub. We'll be happy to assist you with any problems or inquiries you may have. 🛠️

0 comments on commit 86b8bd4

Please sign in to comment.