Scraping several sites at the same time #1

racindustries · 2018-07-27T06:07:30Z

When running the code only the first news website entered in the json list seems to be downloaded and parsed. Do you have any suggestion ?

ghost · 2018-08-30T05:14:25Z

@racindustries same with me. And I've looked around to see if anyone has a solution. Haven't found any.
Were you able to work around this issue?

Susmithap3 · 2018-09-30T19:21:06Z

same i also need help???

ivanovishado · 2018-10-22T06:40:04Z

Can any of you please share the code that you're using?
I used the code of this repo and worked fine with the JSON list I provided to it.

ghost · 2018-10-22T16:43:56Z

It's been a while from my end, but i used the exact same code from holwech only changed the news sites.

…

On Mon, Oct 22, 2018 at 9:40 AM Iván Galaviz ***@***.***> wrote: Can any of you please share the code that you're using? I used the code of this repo and worked fine with the JSON list I provided to it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AlG3evwRtK9o_L2zmixOLc9R8RN5RJZzks5unWhIgaJpZM4Vi9hr> .

ivanovishado · 2018-10-23T02:27:24Z

@Civmwa can you please share the JSON list you used to see if I can reproduce the error?

ghost · 2018-10-24T09:30:10Z

@ivanovishado
{
"The Standard": {
"link": "https://www.standardmedia.co.ke/business"
},
"bbc": {
"rss": "http://feeds.bbci.co.uk/news/rss.xml",
"link": "http://www.bbc.com/"
},
"theguardian": {
"rss": "https://www.theguardian.com/uk/rss",
"link": "https://www.theguardian.com/international"
},
"breitbart": {
"link": "http://www.breitbart.com/"
},
"infowars": {
"link": "https://www.infowars.com/"
},
"foxnews": {
"link": "http://www.foxnews.com/"
},
"nbcnews": {
"link": "http://www.nbcnews.com/"
},
"washingtonpost": {
"rss": "http://feeds.washingtonpost.com/rss/world",
"link": "https://www.washingtonpost.com/"
}
}

ivanovishado · 2018-10-25T04:22:30Z

@Civmwa NewsScraper.py worked fine for me, here's the output file as proof.

Tested it in Windows 10, Python 3.6.2

ghost · 2018-10-25T05:26:56Z

Hi Ivan - Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL. One small issue though, how would i get to print a summary of the article?

…

On Thu, Oct 25, 2018 at 7:22 AM Iván Galaviz ***@***.***> wrote: @Civmwa <https://github.com/Civmwa> NewsScraper.py worked fine for me, here's the output file <https://pastebin.com/ndLPb7QL> as proof. Tested it in Windows 10, Python 3.6.2 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AlG3etznaAb8qfydiRykeT8q-zZc6P27ks5uoTyHgaJpZM4Vi9hr> .

ivanovishado · 2018-10-26T03:21:08Z

Not entirely sure what happened between when i sent it to you and
now, but i ran it and it works. LOL.

@Civmwa lol

how would i
get to print a summary of the article?

You need to add content.nlp() just after content.parse() then you would call content.summary.
Keep in mind that nlp() adds some processing time and the summary won't be perfect.

ghost · 2018-10-27T18:05:15Z

Thanks Ivan. Much appreciated

…

On Fri, Oct 26, 2018 at 6:21 AM Iván Galaviz ***@***.***> wrote: Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL. @Civmwa <https://github.com/Civmwa> lol how would i get to print a summary of the article? You need to add content.nlp() just after content.parse() then you would call content.summary. Keep in mind that nlp() adds some processing time and the summary won't be perfect. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AlG3ejf3IiyYn8ghMi3lXYVRf3nHneZFks5uon-ogaJpZM4Vi9hr> .

ivanovishado · 2018-10-29T01:46:53Z

@Civmwa You're welcome.
I believe this issue can be closed now, @racindustries.

ghost · 2018-10-29T04:21:05Z

Yes.

…

On Mon, Oct 29, 2018 at 4:46 AM Iván Galaviz ***@***.***> wrote: @Civmwa <https://github.com/Civmwa> You're welcome. I believe this issue can be closed now, @racindustries <https://github.com/racindustries>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AlG3eutQdj1alQiwLXzLmkJUsIkRHN3nks5upl4NgaJpZM4Vi9hr> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraping several sites at the same time #1

Scraping several sites at the same time #1

racindustries commented Jul 27, 2018

ghost commented Aug 30, 2018

Susmithap3 commented Sep 30, 2018

ivanovishado commented Oct 22, 2018

ghost commented Oct 22, 2018 via email

ivanovishado commented Oct 23, 2018

ghost commented Oct 24, 2018

ivanovishado commented Oct 25, 2018

ghost commented Oct 25, 2018 via email

ivanovishado commented Oct 26, 2018

ghost commented Oct 27, 2018 via email

ivanovishado commented Oct 29, 2018

ghost commented Oct 29, 2018 via email

Scraping several sites at the same time #1

Scraping several sites at the same time #1

Comments

racindustries commented Jul 27, 2018

ghost commented Aug 30, 2018

Susmithap3 commented Sep 30, 2018

ivanovishado commented Oct 22, 2018

ghost commented Oct 22, 2018 via email

ivanovishado commented Oct 23, 2018

ghost commented Oct 24, 2018

ivanovishado commented Oct 25, 2018

ghost commented Oct 25, 2018 via email

ivanovishado commented Oct 26, 2018

ghost commented Oct 27, 2018 via email

ivanovishado commented Oct 29, 2018

ghost commented Oct 29, 2018 via email