Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping from a list of links #147

Open
HABER7789 opened this issue Mar 29, 2023 · 8 comments
Open

Scraping from a list of links #147

HABER7789 opened this issue Mar 29, 2023 · 8 comments

Comments

@HABER7789
Copy link

Hi, I am really stunned by the scraper you have built and really glad to be able to use it. I am facing an issue in scraping a list of people from an excel file that basically just has links.

The scraper starts scraping the first link, and then after scraping one link, it does manage to go to the other profile as I can view from chrome window, but it throws an exception and is unable to scrape further, giving me the data that was scraped for only one person in the beginning.

I would really appreciate your help in this, attaching my code here.

from linkedin_scraper import Person, actions
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
#from selenium.webdriver.chrome.service import Service
import pandas as pd
import openpyxl

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='C:\chromedriver.exe')
driver.set_window_size(1920, 1080)

email = "Email"
password = "password"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal

dataframe1 = pd.read_excel('People.xlsx') 
links = list(dataframe1['PeopleLinks'])

ExtractedList = []

for i in links:    
    person = Person(i, driver=driver, scrape=False)
    person.scrape(close_on_complete=False)
    ExtractedList.append(person)


for j in ExtractedList:
    print(j)


@joeyism
Copy link
Owner

joeyism commented Mar 29, 2023

What's the error that you get?

@HABER7789
Copy link
Author

What's the error that you get?

Hey there!
image

@HABER7789
Copy link
Author

What's the error that you get?

Hey there! ![image](https://user-images.githubusercontent.com/124895699/228524018-54d53647-b8f4-4e00-a115-d4e763f9bed

What's the error that you get?

This is the error I am getting, issue here is, if it is scraping one person, it should do the same thing for the other right? Please do correct me anywhere if I am incorrect. Thanks!

@rizwankaz
Copy link

Hey! I'm also getting the same error; is it just that the css-selector has changed?

@lusifer021
Copy link
Contributor

#158
This PR solves this issue and can parse multiple person links.

@HABER7789
Copy link
Author

#158 This PR solves this issue and can parse multiple person links.

Thanks a ton!!!!!!!!!!!, it works, really appreciate your help here. Cheers man ! @lusifer021

@lusifer021
Copy link
Contributor

#158 This PR solves this issue and can parse multiple person links.

Thanks a ton!!!!!!!!!!!, it works, really appreciate your help here. Cheers man ! @lusifer021

Welcome @HABER7789

@jakalfayan
Copy link

jakalfayan commented Apr 25, 2023

@joeyism i'm doing this as well and I wanted to ask how to have it exclude scraping the people in the company scrape? My current code is below but wanted to ask since I've got a long list of companies and i don't need the employees piece. Let me know.

`import pandas as pd
from linkedin_scraper import Person, Company, actions
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
ser = Service(r"c:\se\chromedriver.exe")
op = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=ser, options=op)

email = "XXXX@gmail.com"
password = "XXXXXXXXXXX"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal

dataframe1 = pd.read_csv("company_Linkedin_upload.csv')
links = list(dataframe1['linkedin url'])

ExtractedList = []

for i in links:
company = Company(i, driver=driver, scrape=False, get_employees=False)
company.scrape(close_on_complete=False)
ExtractedList.append(company)
print(company)

for j in ExtractedList:
print(j)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants