Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping several sites at the same time #1

Open
racindustries opened this issue Jul 27, 2018 · 12 comments
Open

Scraping several sites at the same time #1

racindustries opened this issue Jul 27, 2018 · 12 comments

Comments

@racindustries
Copy link

When running the code only the first news website entered in the json list seems to be downloaded and parsed. Do you have any suggestion ?

@ghost
Copy link

ghost commented Aug 30, 2018

@racindustries same with me. And I've looked around to see if anyone has a solution. Haven't found any.
Were you able to work around this issue?

@Susmithap3
Copy link

same i also need help???

@ivanovishado
Copy link

Can any of you please share the code that you're using?
I used the code of this repo and worked fine with the JSON list I provided to it.

@ghost
Copy link

ghost commented Oct 22, 2018 via email

@ivanovishado
Copy link

@Civmwa can you please share the JSON list you used to see if I can reproduce the error?

@ghost
Copy link

ghost commented Oct 24, 2018

@ivanovishado
{
"The Standard": {
"link": "https://www.standardmedia.co.ke/business"
},
"bbc": {
"rss": "http://feeds.bbci.co.uk/news/rss.xml",
"link": "http://www.bbc.com/"
},
"theguardian": {
"rss": "https://www.theguardian.com/uk/rss",
"link": "https://www.theguardian.com/international"
},
"breitbart": {
"link": "http://www.breitbart.com/"
},
"infowars": {
"link": "https://www.infowars.com/"
},
"foxnews": {
"link": "http://www.foxnews.com/"
},
"nbcnews": {
"link": "http://www.nbcnews.com/"
},
"washingtonpost": {
"rss": "http://feeds.washingtonpost.com/rss/world",
"link": "https://www.washingtonpost.com/"
}
}

@ivanovishado
Copy link

@Civmwa NewsScraper.py worked fine for me, here's the output file as proof.

Tested it in Windows 10, Python 3.6.2

@ghost
Copy link

ghost commented Oct 25, 2018 via email

@ivanovishado
Copy link

Not entirely sure what happened between when i sent it to you and
now, but i ran it and it works. LOL.

@Civmwa lol

how would i
get to print a summary of the article?

You need to add content.nlp() just after content.parse() then you would call content.summary.
Keep in mind that nlp() adds some processing time and the summary won't be perfect.

@ghost
Copy link

ghost commented Oct 27, 2018 via email

@ivanovishado
Copy link

@Civmwa You're welcome.
I believe this issue can be closed now, @racindustries.

@ghost
Copy link

ghost commented Oct 29, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants