Skip to content

Commit 0456129

Browse files
Added Reddit-scraping-and-flair-detection folder
1 parent ce7e971 commit 0456129

13 files changed

+4559
-0
lines changed

.DS_Store

12 KB
Binary file not shown.
6 KB
Binary file not shown.

Reddit-scraping-and-flair-detection/Exploratory-Data-Analysis(EDA).ipynb

Lines changed: 1015 additions & 0 deletions
Large diffs are not rendered by default.

Reddit-scraping-and-flair-detection/Modelling.ipynb

Lines changed: 1364 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Reddit Flair Detector
2+
## Steps followed:
3+
4+
Described each step along with code in the notebooks.
5+
6+
### Step 1: Extraction of r/india data
7+
Used praw library of python for extraction.
8+
9+
### Step 2: Exploratory Data Analysis
10+
Analysed the data using graphs and scattered points as well as correlation. Used matplotlib library for the same.
11+
12+
### Step 3: Made Reddit Flair Detector. Performed the following the steps:
13+
- Preprocessed the data: Removed stopwords and performed stemming on the data
14+
- Diving into training and test: Divided the dataset into training and test set. Used standard, 0.7:0.3 metric
15+
- Testing accross classifiers: Tested along 3 classifiers: Naive Bayees, SVM and Logisitic Regression. Checked accuracy of each of the classifiers.
16+
- Saving the model: Saved the model with highest accuracy in a .sav file to use it for prediction.
17+
- Model testing: Take input URL from the user and return the predicted and actual flairs. Call the saved model for predicted flairs
18+
19+
### How it works:
20+
The model reads all the urls in the file line by line and predict the flair
21+
- The same is stored in json file.
22+
23+
### Output:
24+
25+
It will be a key and predicted flair as value.
26+
27+

Reddit-scraping-and-flair-detection/WebScrapping and PreProcessing.ipynb

Lines changed: 936 additions & 0 deletions
Large diffs are not rendered by default.
140 KB
Loading

Reddit-scraping-and-flair-detection/data.csv

Lines changed: 1217 additions & 0 deletions
Large diffs are not rendered by default.
6.72 MB
Binary file not shown.
58.3 KB
Loading

0 commit comments

Comments
 (0)