ITHiringTrendAnalysis

The project performs text analysis on various job openings in the IT industry and identifies the technical keywords in them. It computes the highly demanded technical skills in the industry by visually representing the data on Tableau

Keywords extraction

Identified 1000 tech keywords from PDFs by using the TF-IDF Algorithm to compute the frequencies of words in the document.

Scraping

Created a raw dataset by Web scraping job description from a search engine with Beautifulsoup.

Data storage

Storing the data in S3 bucket in the form of csv by configuring aws credentials on local system

AWS Configuration with aws cli

pip install awscli

aws configure

AWS Access Key : ACCESS_KEY

AWS Secret Key : SECREET_KEY

AWS Region : 'Your Region'

Exploratory data analysis

Explored the dataset by identifying the null values, type of data collected, job descriptions extracted and discovered text in the data

Word Count

Identified the tech word tokens from each job description by comparing the words with finalized keywords

Maintained the count of each word for every job description

Merged the counts with the remaining data to create a dataset having additional columns of key words storing the count

Data Streaming

Implemented the live streaming of data with AWS Kinesis

Step 1: Implemented the live streaming of data with AWS Kinesis Step 1: Created a Kinesis Data stream to fetch the records with '|' delimiter within records (no of shards - 1)

Step 2: Created a Kinesis Data Delivery Strem (AWS Firehorse) which takes input from the data stream. Enable the SSE under data encryption and add the IP Address as per region under subnet groups to give VPC access

Step 3: Create a Lambda function which stores the data in the form of CSV on AWS S3

Step 4: Create a redshift database to store the data

Step 5: Write a copy command in the delivery stream configuration to copy the csv records from firehorse to redshift database

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
images		images
.gitignore		.gitignore
.ipynb		.ipynb
README.md		README.md
Visualization.twb		Visualization.twb
jpy01_extract_techkeywords.ipynb		jpy01_extract_techkeywords.ipynb
jpy02_web_scraping_jobdesc.ipynb		jpy02_web_scraping_jobdesc.ipynb
jpy03_exploratory_data_analysis.ipynb		jpy03_exploratory_data_analysis.ipynb
jpy04_wordcount.ipynb		jpy04_wordcount.ipynb
jpy05_data_streaming.ipynb		jpy05_data_streaming.ipynb
reference.txt		reference.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

.gitignore

.gitignore

.ipynb

.ipynb

README.md

README.md

Visualization.twb

Visualization.twb

jpy01_extract_techkeywords.ipynb

jpy01_extract_techkeywords.ipynb

jpy02_web_scraping_jobdesc.ipynb

jpy02_web_scraping_jobdesc.ipynb

jpy03_exploratory_data_analysis.ipynb

jpy03_exploratory_data_analysis.ipynb

jpy04_wordcount.ipynb

jpy04_wordcount.ipynb

jpy05_data_streaming.ipynb

jpy05_data_streaming.ipynb

reference.txt

reference.txt

Repository files navigation

ITHiringTrendAnalysis

Keywords extraction

Scraping

Data storage

AWS Configuration with aws cli

Exploratory data analysis

Word Count

Data Streaming

About

Releases

Packages

Languages

pratiksha-sawant/IT-Hiring-Trend-Analysis

Folders and files

Latest commit

History

Repository files navigation

ITHiringTrendAnalysis

Keywords extraction

Scraping

Data storage

AWS Configuration with aws cli

Exploratory data analysis

Word Count

Data Streaming

About

Resources

Stars

Watchers

Forks

Languages