Skip to content

Add scraping functions#21

Closed
Samwest5 wants to merge 1 commit into
progteam:masterfrom
Samwest5:master
Closed

Add scraping functions#21
Samwest5 wants to merge 1 commit into
progteam:masterfrom
Samwest5:master

Conversation

@Samwest5
Copy link
Copy Markdown
Collaborator

@Samwest5 Samwest5 commented Mar 1, 2019

Implemented two functions for retrieving data from Kattis:

get_score() scrapes the score of the profile that matches passed user_handle
get_rank() scrapes the rank of the profile that matches passed user_handle

Both functions use is_student() to determine if passed profile is from CSUMB.

Copy link
Copy Markdown
Collaborator

@donaldong donaldong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good start!

To make it better, I think it would be valuable to provide some unit tests along with this script.

For example, assert(get_points('donaldong') > 100) would be true, get_points('a') should raise Error404, get_points('xiaowuc1') should raise ErrorInvalid because Nick Wu is not at CSUMB.

Maybe we sould move this file as backend/scoreboard/kattis_scrapping.py. Then we can have backend/scoreboard/test_kattis_scrapping.py, then the test will be automatically discovered by Django. Here's more information: https://docs.djangoproject.com/en/2.1/topics/testing/overview/

Comment thread scripts/scraping.py Outdated
Comment thread scripts/scraping.py Outdated
Comment thread scripts/scraping.py Outdated
Comment thread scripts/scraping.py Outdated
Comment thread scripts/scraping.py Outdated
Comment thread scripts/scraping.py
Comment thread scripts/scraping.py
Comment thread scripts/scraping.py
@@ -0,0 +1,86 @@
import requests
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a duplicated file, perhaps checked in by accident.

@@ -0,0 +1,2 @@
from django.test import TestCase

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this file is still under construction?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have left that off the commit. I wanted to show that I hadn't ignored the test cases, but its a work in progress.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh btw, I think the file(s) are still in the scripts/ dir.

I think it would better to place them in backend/scoreboard because those functions will be used by the backend API.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes, I agree. Another woops on my part when I placed the folder

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fixed now

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! 👍

Copy link
Copy Markdown
Collaborator

@donaldong donaldong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks for adding the tests!

Please rebase on top of the master branch, and do a force push to your master branch. Now we have the python linter, I think there are some lint errors here.

We're missing some docstrings. Here is an example: the test I wrote earlier https://github.com/progteam/parrot/blob/7865b948f6d7a8973c4f982c404f6c522842a4d2/backend/test_redirect_404.py



class KattisScrapeTestCase(TestCase):
def setUp(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's nothing to do here, we don't have to have this method.

try:
kattis_scraping.get_points("ryan-beck")
except kattis_scraping.Error404:
self.fail()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to assert it gonna give us as a valid number


def test_get_points_not_throws_exception_invalid(self):
try:
kattis_scraping.get_points("ryan-beck")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a duplicate test case. I think we can combine them.


def test_get_rank_not_throws_exception_404(self):
try:
kattis_scraping.get_rank("ryan-beck")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment.


def test_get_rank_throws_exception_invalid(self):
with self.assertRaises(kattis_scraping.ErrorInvalid):
kattis_scraping.get_rank("xiaowuc1")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's more readable to group get_points and get_rank together?

I'm thinking about test_scraping_throws_invalid and test_scraping_throws_404.

from bs4 import BeautifulSoup


class Error(Exception):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This is a little redundant to me

@donaldong
Copy link
Copy Markdown
Collaborator

Oh, another thing to consider: get_score vs get_points?

@donaldong
Copy link
Copy Markdown
Collaborator

I think rafael-rivas will result in ErrorInvalid for some reasons? Is this a bug?

@donaldong
Copy link
Copy Markdown
Collaborator

The previous bug is caused by missing cities since we're identifying the school name by the index for the anchor tag.

donaldong pushed a commit that referenced this pull request Apr 22, 2019
Implemented two functions for retrieving data from Kattis:

- get_score() scrapes the score of the profile that matches passed
  user_handle
- get_rank() scrapes the rank of the profile that matches passed
  user_handle

Both functions use is_student() to determine if passed profile is from
CSUMB.

PR: #21
Resolves: #17
donaldong pushed a commit that referenced this pull request Apr 22, 2019
Implemented two functions for retrieving data from Kattis:

- get_score() scrapes the score of the profile that matches passed
  user_handle
- get_rank() scrapes the rank of the profile that matches passed
  user_handle

Both functions use is_student() to determine if passed profile is from
CSUMB.

PR: #21
Resolves: #17
@donaldong
Copy link
Copy Markdown
Collaborator

Fixed these issues and pushed.

@donaldong donaldong closed this Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants