### **Web Scraping LinkedIn Articles Data**
##### *Description:* Developed a Python script for web scraping LinkedIn articles data. Utilized the requests library for fetching HTML content, and BeautifulSoup for parsing and extracting relevant information. Extracted questions, contribution counts, timestamps, article descriptions, and tagged topics. Organized the data into a Pandas DataFrame and saved it as a CSV file.
##### *Technologies:* Python, requests, BeautifulSoup, Pandas.
##### *Skills:* Web scraping, data extraction, Python scripting, data manipulation.


In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

###### URL of the LinkedIn Articles page

In [2]:
url='https://www.linkedin.com/pulse/topics/home/?trk=guest_homepage-basic_guest_nav_menu_articles'

###### with the help of requests if we get the response of 200 it means that we can scrap that page

In [3]:
page=requests.get(url)
page

<Response [200]>

In [4]:
data=page.text

In [5]:
soup=bs(data,"html.parser")

###### Printing out the prettified HTML content and title

In [6]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta content="d_career_advice_hub_home" name="pageKey"/>
  <!-- -->
  <!-- -->
  <meta content="en_US" name="locale"/>
  <meta data-app-version="0.0.3176" data-browser-id="15b6d0b3-45c9-44b1-8a88-0675c18fdb0f" data-call-tree-id="AAYQLQ5duGN9VfPcF+/41Q==" data-disable-jsbeacon-pagekey-suffix="false" data-enable-page-view-heartbeat-tracking="" data-member-id="0" data-multiproduct-name="article-ssr-frontend" data-page-instance="urn:li:page:d_career_advice_hub_home;h94hQ7ZwRGCJcZiJsrQr/A==" data-service-name="article-ssr-frontend" id="config"/>
  <link href="https://www.linkedin.com/pulse/topics/home/" rel="canonical"/>
  <!-- -->
  <!-- -->
  <!-- -->
  <!-- -->
  <!-- -->
  <!-- -->
  <link href="https://static.licdn.com/aero-v1/sc/h/al2o9zrvru7aqj8e1x2rzsrca" rel="icon"/>
  <script>
   function getDfd() {let yFn,nFn;const p=new Promise(function(y, n){yFn=y;nFn=n;});p.resolve=yFn;p.reject=nFn;return p;}
          window.lazyloader = getDfd();
 

###### getting title of this page

In [7]:
title=soup.find('title') 
print(title.text)

Discover thousands of collaborative articles on 2500+ skills


###### Extracting the content from a specific section of the page

In [8]:
content=soup.find("div",class_="core-section-container__content break-words")
print(content.text)


We’re unlocking community knowledge in an all new way. It starts with an article on a professional topic or skill, written with the help of AI — but it’s not complete without insights and advice from people with real-life experiences. We invited experts to contribute. Learn more



###### Using find all we can extract all data related to tag and class

In [9]:
# Extracting questions
Question=soup.find_all("h2",class_="mb-1 overflow-hidden break-words font-sans text-lg font-[500] babybear:text-md")
Questions=[]
for i in Question:
    Questions.append(i.text)
print(Questions)

['Your team is struggling to keep up with productivity. What mentoring software should you consider?', "What are the most effective tools for managing your technical support team's schedules?", "You're managing a virtual team. How can you ensure project management success?", 'Your HR department needs mentoring tools. How do you choose the best ones?', 'What does an electronic specialist do?', "You're mentoring a team on a project. What are the best project management tools to use?", 'What does a food service order clerk do?', 'You’re struggling to improve your public speaking skills. How can you overcome this challenge?', 'How can you create effective teacher workshops?', "You're a creative strategist with limited time. Which content creation tools can you master quickly?", 'How do you become a banquet bartender?', 'What is the most effective way to test ETL workflows for accuracy and completeness?', 'You need to analyze your business content. What are the best tools for the job?', "Yo

In [10]:
# Extracting contribution counts
contribution=soup.find_all("span",class_="pr-0.5 pt-0.5")
contributions=[]
for i in contribution:
    contributions.append(i.text.strip())
print(contributions)

['45 contributions', '18 contributions', '38 contributions', '21 contributions', '5 contributions', '21 contributions', '1 contribution', '52 contributions', '12 contributions', '20 contributions', '2 contributions', '4 contributions', '8 contributions', '25 contributions', '5 contributions', '1 contribution', '1 contribution', '43 contributions', '12 contributions', '4 contributions', '3 contributions', '4 contributions', '2 contributions', '2 contributions', '35 contributions', '3 contributions', '6 contributions', '11 contributions', '17 contributions', '7 contributions', '10 contributions', '1 contribution', '6 contributions', '6 contributions', '2 contributions', '6 contributions', '6 contributions', '1 contribution', '10 contributions', '1 contribution', '6 contributions', '4 contributions', '4 contributions', '3 contributions', '3 contributions', '17 contributions', '1 contribution', '13 contributions', '17 contributions', '7 contributions', '4 contributions', '10 contributions'

In [11]:
# Extracting timestamps
timestamps=soup.find_all("span",class_="before:middot pt-0.5")
timestamp=[]
for i in timestamps:
    timestamp.append(i.text.strip())
print(timestamp)

['6 minutes ago', '3 minutes ago', '1 minute ago', '23 minutes ago', '9 hours ago', '51 minutes ago', '10 hours ago', '3 minutes ago', '1 hour ago', '7 minutes ago', '10 hours ago', '28 minutes ago', '23 minutes ago', '10 minutes ago', '1 hour ago', '10 hours ago', '10 hours ago', '1 minute ago', '1 hour ago', '4 hours ago', '4 hours ago', '14 minutes ago', '5 hours ago', '37 minutes ago', '8 minutes ago', '5 hours ago', '34 minutes ago', '1 hour ago', '30 minutes ago', '2 hours ago', '2 hours ago', '10 hours ago', '5 hours ago', '2 hours ago', '3 hours ago', '2 hours ago', '2 hours ago', '10 hours ago', '15 minutes ago', '10 hours ago', '43 minutes ago', '2 hours ago', '3 hours ago', '10 hours ago', '3 hours ago', '1 hour ago', '10 hours ago', '2 hours ago', '2 minutes ago', '4 hours ago', '4 hours ago', '8 minutes ago', '10 hours ago', '3 hours ago', '10 hours ago', '2 hours ago', '11 hours ago', '11 hours ago', '15 minutes ago', '4 hours ago', '3 hours ago', '52 minutes ago', '51 mi

In [12]:
# Extracting descriptions of article
desc=soup.find_all("p",class_="content-description mt-0.5 break-words font-sans text-sm font-normal babybear:text-xs")
description=[]
for i in desc:
    description.append(i.text.strip())
print(description)

["Learn how to choose the best mentoring software for your team's productivity. Consider features, ease of use, support, cost, reviews, and trial.", "Learn about the best tools to automate, optimize, and streamline your technical support team's schedules, and improve your team's productivity, satisfaction, and…", 'Learn how to use facilitation skills and tools to manage your virtual team and project effectively and successfully.', 'Learn how to find the best mentoring tools for your HR department. Discover how to assess your needs, compare your options, and test your choices.', 'Learn what an electronic specialist does, what skills and qualifications you need, and what industries and sectors you can work in.', 'Learn how to use the best project management tools to mentor your team on a project. Discover how to define your goals, choose a platform, use a framework, and more.', 'Learn what a food service order clerk does, what skills and qualifications are required, and what are the bene

In [13]:
# Extracting tagged topics article
topic=soup.find_all("a",class_="tagged-topic !text-color-text")
topics=[]
j=1
for i in topic:
    topics.append(i.text.strip())
    j=j+1
    if j>100:
        break;
print(topics)
len(topics)

['Mentoring', 'Soft Skills', 'Technical Support', 'IT Services', 'Facilitation', 'Soft Skills', 'Mentoring', 'Soft Skills', 'Computer Engineering', 'Engineering', 'Mentoring', 'Soft Skills', 'F&B Operations', 'Food and Beverage Management', 'Business Communications', 'Business Administration', 'Teaching', 'Education', 'Creative Strategy', 'Marketing', 'F&B Operations', 'Food and Beverage Management', 'Database Administration', 'Engineering', 'Content Development', 'Content Management', 'Career Development Coaching', 'HR Management', 'Design', 'Art', 'Environmental Design', 'Sustainability', 'Leadership in Energy and Environmental Design (LEED)', 'Content Marketing', 'Marketing', 'Project Leadership', 'Soft Skills', 'Systems Design', 'Engineering', 'Event Planning', 'Marketing', 'Business Operations', 'Business Administration', 'Augmented Reality (AR)', 'Engineering', 'Environmental Services', 'Public Administration', 'Digital Strategy', 'Marketing', 'Vendor Negotiation', 'Soft Skills',

100

###### Creating a DataFrame

In [14]:
df=pd.DataFrame()

In [15]:
#Adding data in dataframe
df["Questions"]=Questions
df["Description"]=description
df["Contributions"]=contributions
df["Timestamp"]=timestamp
df["Topics"]=topics
df

Unnamed: 0,Questions,Description,Contributions,Timestamp,Topics
0,Your team is struggling to keep up with produc...,Learn how to choose the best mentoring softwar...,45 contributions,6 minutes ago,Mentoring
1,What are the most effective tools for managing...,"Learn about the best tools to automate, optimi...",18 contributions,3 minutes ago,Soft Skills
2,You're managing a virtual team. How can you en...,Learn how to use facilitation skills and tools...,38 contributions,1 minute ago,Technical Support
3,Your HR department needs mentoring tools. How ...,Learn how to find the best mentoring tools for...,21 contributions,23 minutes ago,IT Services
4,What does an electronic specialist do?,"Learn what an electronic specialist does, what...",5 contributions,9 hours ago,Facilitation
...,...,...,...,...,...
95,You’re starting a new project. What product de...,Learn about the most popular and useful produc...,6 contributions,10 hours ago,Transportation Planning
96,You're struggling to manage employee relations...,Learn how HR workflow management tools can hel...,21 contributions,1 hour ago,Engineering
97,You need to share your design system with your...,"Learn how to communicate your design vision, c...",13 contributions,2 minutes ago,Business Development
98,Your sales team is struggling to find new lead...,Discover the best prospecting tools that can h...,5 contributions,17 minutes ago,Business Administration


###### Saving the DataFrame as a CSV file

In [16]:
df.to_csv("linkedin_data.csv",index=False)

###### Reading the saved CSV file 

In [17]:
df_read=pd.read_csv("linkedin_data.csv")
df_read

Unnamed: 0,Questions,Description,Contributions,Timestamp,Topics
0,Your team is struggling to keep up with produc...,Learn how to choose the best mentoring softwar...,45 contributions,6 minutes ago,Mentoring
1,What are the most effective tools for managing...,"Learn about the best tools to automate, optimi...",18 contributions,3 minutes ago,Soft Skills
2,You're managing a virtual team. How can you en...,Learn how to use facilitation skills and tools...,38 contributions,1 minute ago,Technical Support
3,Your HR department needs mentoring tools. How ...,Learn how to find the best mentoring tools for...,21 contributions,23 minutes ago,IT Services
4,What does an electronic specialist do?,"Learn what an electronic specialist does, what...",5 contributions,9 hours ago,Facilitation
...,...,...,...,...,...
95,You’re starting a new project. What product de...,Learn about the most popular and useful produc...,6 contributions,10 hours ago,Transportation Planning
96,You're struggling to manage employee relations...,Learn how HR workflow management tools can hel...,21 contributions,1 hour ago,Engineering
97,You need to share your design system with your...,"Learn how to communicate your design vision, c...",13 contributions,2 minutes ago,Business Development
98,Your sales team is struggling to find new lead...,Discover the best prospecting tools that can h...,5 contributions,17 minutes ago,Business Administration
