New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #1252: Add Scraper for questions of QuoraScraper #1255
Conversation
cf66e91
to
17ca182
Compare
@Orbiter @sudheesh001 @daminisatya @jig08 @mariobehling @singhpratyush @kavithaenair @SKrPl @hemantjadon @Achint08 @sarishinohara @djmgit I am stuck at a wierd problem, In this method getData(), scraped data is returned to here https://github.com/loklak/loklak_server/pull/1255/files#diff-37e4b654b188b27f598f203ceb234782R83 , after this method is returns it's control. |
b1ec82a
to
c8b0b58
Compare
Codecov Report
@@ Coverage Diff @@
## development #1255 +/- ##
===============================================
- Coverage 9.08% 8.99% -0.1%
+ Complexity 397 396 -1
===============================================
Files 200 200
Lines 17359 17506 +147
Branches 3249 3267 +18
===============================================
- Hits 1577 1574 -3
- Misses 15475 15623 +148
- Partials 307 309 +2
Continue to review full report at Codecov.
|
7562434
to
3bb0261
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again the same issue. You are partially providing fixes to two or three issues together and creating multiple PRs. If u think each fix is different then it's the problem with you on dividing a huge chunk of problem into smaller issues.
@daminisatya please see PR description, :) |
083c479
to
3d4d4cf
Compare
@Achint08 @djmgit @hemantjadon @kavithaenair @SKrPl @singhpratyush @daminisatya : please review, I have rebased the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Please take a look.
Also, please update the deployment link.
@@ -37,12 +40,12 @@ | |||
|
|||
public class QuoraProfileScraper extends BaseScraper { | |||
|
|||
private final long serialVersionUID = -3398701925784347310L; | |||
private final long serialVersionUID = -3398701925784347312L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change the serialVersionUID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done this to show that this class is not backward compatible.
https://stackoverflow.com/questions/285793/what-is-a-serialversionuid-and-why-should-i-use-it
dataThreads[i].join(); | ||
} | ||
} catch(InterruptedException e) { | ||
DAO.severe("Couldn't complete all threads"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to mention the scraper name and type
variable here so that it is easy to debug.
@@ -266,14 +266,26 @@ private Timeline2 addPost(Post post) { | |||
return this; | |||
} | |||
|
|||
public void mergePost(Timeline2 list) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using Timeline2#putAll
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skipped this method as still UserEntry
is not in use. I will have to set up Post object for this.
9749393
to
b9f0811
Compare
@singhpratyush , updated the test link, please see :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search results that I see for http://35.187.215.147/api/quoraprofilescraper?query=ronaldo looks like this -
{
"metadata": {
"count": "11",
"hits": 11
},
"peer_hash": "SfjbDL/f/9SH1ySjUSX8k+lZ2ocidIqVgxRpbwv1EA0=",
"peer_hash_algorithm": "SHA-256",
"session": {
"identity": {
"anonymous": true,
"name": "10.12.1.1",
"type": "host"
}
},
"statuses": [
{
"bio": "",
"feeds": {},
"knows_about": [
"Ronaldo",
"Ronaldo",
"Ronaldo",
"Ronaldo",
"Ronaldo"
],
"profileImage": "https://qph.ec.quoracdn.net/main-thumb-86854754-50-apcitmmrwrdgxmcxzspksmqprxzawqzj.jpeg",
"rss_feed_link": "https://www.quora.com/profile/ronaldo/rss",
"search_url": "https://www.quora.com/profile/ronaldo",
"timestamp": 1499189682253,
"user": "Phillip Michael Mpalabule"
},
{
"post_ques": "What are some famous gestures of respect in sports?",
"post_type": "question",
"post_url": "https://www.quora.com//What-are-some-famous-gestures-of-respect-in-sports",
"search_url": "https://www.quora.com/search/?q=ronaldo&type=question",
"timestamp": 1499189682080
},
...
}
The users and questions are returned at the same level in JSON, which is not good as they are completely different entities. Somthing like this would be better -
{
"metadata": {
...
"count": {
"user": 1,
"question": 11
},
...
},
"users": [
...
],
"questions": [
...
]
}
@vibhcool I agree with @singhpratyush 's idea, it will be great if the users and questions are at different level |
@hemantjadon @singhpratyush , instead of arranging different posts according to different types, we can mention the
|
@vibhcool: This is not a good practice. If the entities are different, we should not put them in same If we later happen to have lots of services to give as an answer, we can put them under different levels as follows -
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vibhcool I agree here with @singhpratyush they should be in different levels
e9c0a15
to
85561f8
Compare
@daminisatya @singhpratyush @kavithaenair @SKrPl @hemantjadon @Achint08 @djmgit Changes:
not changed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topics in knows_about
differ in what the actual user knows, for example:
The query was "Android", the user provided is Marc Bodnick
:
Things he actually knows about differ from Android (operating system)
, related screen shot:
Also, the search_url
param has value https://www.quora.com/profile/Android
which redirects to https://www.quora.com/topic/Android-operating-system
, so is it possible to get the final URL (after redirection)?
@SKrPl yes, I am creating issue for this, these aren't related to PR task :) |
Fixes Issue: #1252
Test link : http://35.184.148.24/api/quoraprofilescraper?query=Vibhor-Verma-5
http://35.184.148.24/api/quoraprofilescraper?query=asd
I have:
Fixes #<number> commit message
For the reviewers
I have: