## Introduction

As the data file included the cover photo, resulting in a 1GB file size. I downloaded the file, unzipped to the working directory named 'medium_data.csv'. The images will be used for another image analysis.

---
---

## Coding
### Preparation

In [21]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen, Request

In [2]:
dat = pd.read_csv('medium_data.csv')
dat.describe()

Unnamed: 0,id,claps,reading_time
count,6508.0,6508.0,6508.0
mean,3254.5,311.07606,6.134911
std,1878.842108,950.789896,3.231918
min,1.0,0.0,0.0
25%,1627.75,54.0,4.0
50%,3254.5,115.0,5.0
75%,4881.25,268.25,7.0
max,6508.0,38000.0,55.0


In [3]:
dat.shape

(6508, 10)

In [4]:
dat = dat.iloc[:,1:]
dat.head(4)

Unnamed: 0,url,title,subtitle,image,claps,responses,reading_time,publication,date
0,https://towardsdatascience.com/a-beginners-gui...,A Beginner’s Guide to Word Embedding with Gens...,,1.png,850,8,8,Towards Data Science,2019-05-30
1,https://towardsdatascience.com/hands-on-graph-...,Hands-on Graph Neural Networks with PyTorch & ...,,2.png,1100,11,9,Towards Data Science,2019-05-30
2,https://towardsdatascience.com/how-to-use-ggpl...,How to Use ggplot2 in Python,A Grammar of Graphics for Python,3.png,767,1,5,Towards Data Science,2019-05-30
3,https://towardsdatascience.com/databricks-how-...,Databricks: How to Save Files in CSV on Your L...,When I work on Python projects dealing…,4.jpeg,354,0,4,Towards Data Science,2019-05-30


In [5]:
dat = dat.drop(index = dat[dat['responses'] == 'Read'].index)
dat['reading_time'] = dat['reading_time'].astype('float')
dat['responses'] = dat['responses'].astype('float')

In [6]:
dat.dtypes

url              object
title            object
subtitle         object
image            object
claps             int64
responses       float64
reading_time    float64
publication      object
date             object
dtype: object

---
### Descriptive Statistics

In [7]:
# Checking for Missing Values
for column in range(len(dat.columns)):
    print(dat.columns[column], ': ',len(dat[dat.iloc[:,column].isnull()]))

url :  0
title :  0
subtitle :  3027
image :  146
claps :  0
responses :  0
reading_time :  0
publication :  0
date :  0


In [8]:
dat.iloc[:,6].unique()

array([ 8.,  9.,  5.,  4., 12., 18.,  6., 21., 14., 10.,  3., 19.,  7.,
       16.,  2., 22., 11., 13., 20., 15.,  1., 40., 32., 17., 27., 31.,
       26.,  0., 24., 25., 23., 33., 55., 36.])

In [9]:
# looking into article count provided by differences sub-publication of Medium
dat.groupby('publication').agg({'responses': ['count','mean'],
                                'reading_time':['count','mean']})

Unnamed: 0_level_0,responses,responses,reading_time,reading_time
Unnamed: 0_level_1,count,mean,count,mean
publication,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Better Humans,28,8.535714,28,13.357143
Better Marketing,242,4.619835,242,6.409091
Data Driven Investor,777,0.37323,777,5.222651
The Startup,3041,1.791845,3041,5.906281
The Writing Cooperative,403,3.163772,403,4.965261
Towards Data Science,1461,1.732375,1461,7.276523
UX Collective,554,1.471119,554,6.032491


In [11]:
dat

Unnamed: 0,url,title,subtitle,image,claps,responses,reading_time,publication,date
0,https://towardsdatascience.com/a-beginners-gui...,A Beginner’s Guide to Word Embedding with Gens...,,1.png,850,8.0,8.0,Towards Data Science,2019-05-30
1,https://towardsdatascience.com/hands-on-graph-...,Hands-on Graph Neural Networks with PyTorch & ...,,2.png,1100,11.0,9.0,Towards Data Science,2019-05-30
2,https://towardsdatascience.com/how-to-use-ggpl...,How to Use ggplot2 in Python,A Grammar of Graphics for Python,3.png,767,1.0,5.0,Towards Data Science,2019-05-30
3,https://towardsdatascience.com/databricks-how-...,Databricks: How to Save Files in CSV on Your L...,When I work on Python projects dealing…,4.jpeg,354,0.0,4.0,Towards Data Science,2019-05-30
4,https://towardsdatascience.com/a-step-by-step-...,A Step-by-Step Implementation of Gradient Desc...,One example of building neural…,5.jpeg,211,3.0,4.0,Towards Data Science,2019-05-30
...,...,...,...,...,...,...,...,...,...
6503,https://medium.com/better-marketing/we-vs-i-ho...,“We” vs “I” — How Should You Talk About Yourse...,Basic copywriting choices with a big…,6504.jpg,661,6.0,6.0,Better Marketing,2019-12-05
6504,https://medium.com/better-marketing/how-donald...,How Donald Trump Markets Himself,Lessons from who might be the most popular bra...,6505.jpeg,189,1.0,5.0,Better Marketing,2019-12-05
6505,https://medium.com/better-marketing/content-an...,Content and Marketing Beyond Mass Consumption,How to acquire customers without wasting money...,6506.jpg,207,1.0,8.0,Better Marketing,2019-12-05
6506,https://medium.com/better-marketing/5-question...,5 Questions All Copywriters Should Ask Clients...,Save time and effort by…,6507.jpg,253,2.0,5.0,Better Marketing,2019-12-05


In [15]:
dat.loc[0,'url']


'https://towardsdatascience.com/a-beginners-guide-to-word-embedding-with-gensim-word2vec-model-5970fa56cc92'

In [25]:
header = {'User-Agent': 'Mozilla/5.0'}

req = Request(url=dat.loc[0,'url'], headers = header)

In [30]:
req =  Request(dat.loc[0,'url'], headers = header)

soup = BeautifulSoup(urlopen(req), 'html.parser')
for link in soup.find_all('div'): # It helps to find all anchor tag's
    print(link.text)

Get startedOpen in appSign inGet startedFollow535K Followers·Editors' PicksFeaturesExploreContributeAboutGet startedOpen in appA Beginner’s Guide to Word Embedding with Gensim Word2Vec ModelZhi LiMay 30, 2019·8 min readWord embedding is one of the most important techniques in natural language processing(NLP), where words are mapped to vectors of real numbers. Word embedding is capable of capturing the meaning of a word in a document, semantic and syntactic similarity, relation with other words. It also has been widely used for recommender systems and text classification. This tutorial will show a brief introduction of genism word2vec model with an example of generating word embedding for the vehicle make model.Table of Contents1. Introduction of Word2vec2. Gensim Python Library Introduction3. Implementation of word Embedding with Gensim Word2Vec Model3.1 Data Preprocessing:3.2. Genism word2vec Model Training4. Compute Similarities5. T-SNE Visualizations1. Introduction of Word2vecWord2v