-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added News API, tests, and documentation. #48
Conversation
@Rahi374 Don't worry about support Python 2.7 as we have officially dropped the at version. Also, you pretty much need |
PittAPI/news.py
Outdated
news_links = map((lambda i: i['href']), soup.find_all('a', class_="kgoui_list_item_action")) | ||
news_links = map((lambda i: re.sub(r"\+at\+.+edu", "", i)), news_links) | ||
news_links = map((lambda i: i.replace("/news", "https://m.pitt.edu/news")), news_links) | ||
news_links = map((lambda i: unicode(i, 'utf-8')), news_links) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are giving off a TabError
, which is simple to fix. You just need to replace the tabs in front of all the news_names
and news_links
with 8 spaces and then it will pass the tests and work correctly. Python doesn't particularly like if you mix tabs and spaces so it's giving an error to force you to choose one, in our case spaces.
PittAPI/news.py
Outdated
|
||
map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links) | ||
|
||
if any(u'Load more...' in s for s in news_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On lines 44, 51, 55 go ahead and drop the u
and Unicode
. Python 3 takes care of Unicode for us, hence why we switched over to Python 3.
@azharichenko I know we dropped support but that's the only Python that I have that works with pip untill I do a dist-upgrade. |
PittAPI/news.py
Outdated
map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links) | ||
|
||
if any('Load more...' in s for s in news_names): | ||
news.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the tests, news.pop()
threw an IndexError
saying that the list is empty. It looks like nothing gets appended to news. I'm trying to investigate what going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
news = list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))
this was my fix. Though if this is suppose to return a list
instead of dict
then the test needs to be changed. Once I changed that line and change desired type to be returned in the tests it gave me the pass for the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My sample program that prints out the news prints out all the news.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with assigning the output of the map to the news list is that it'll overwrite what was already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we need to loop the get request a few times just like the Dining API because that's how the fetch works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see what you're talking about. This should fix that news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links)))
. This should prevent overwriting the news
list while allowing new elements to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds perfect.
Umm, I have had the same broken dependencies issue a while ago, I think it solved using The recommendation online is to run these commands. sudo dpkg --configure -a
sudo apt-get install -f |
I tried all that many times; didn't work. |
PittAPI/news.py
Outdated
while not end_loop: | ||
url = 'https://m.pitt.edu/news/index.json?feed={}&id=&_object=kgoui_Rcontent_I0_Rcontent_I0&_object_include_html=1'.format(feed) + '&start=' + str(counter) | ||
data = sess.get(url).json() # Should be UTF-8 by JSON standard | ||
soup = BeautifulSoup(data['response']['html'], 'lxml') #, parse_only=strainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BeautifulSoup is really slow. Raw string hacking might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
Like just directly extract the html value from the json without BeautifulSoup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's all JSON then just use the internal JSON library which would be way faster. Link to docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait never mind I do need BeautifulSoup since I'm processing the element of the JSON that contains only html.
PittAPI/news.py
Outdated
news_links = map((lambda i: i.replace("/news", "https://m.pitt.edu/news")), news_links) | ||
#news_links = map((lambda i: unicode(i, 'utf-8')), news_links) | ||
|
||
map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really verbose and unreadable, don't hesitate to write something over multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish Python allowed multiline lambdas.
I'll probably define a helper function and pass that into map.
PittAPI/news.py
Outdated
|
||
end_loop = False | ||
counter = 0 | ||
while not end_loop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of this while loop structure. It seems extremely hacky and there has to be a better way to do it than setting a sentinel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slightly better way of doing this is to get rid of the end_loop
variable and use while True:
for the loop and in the else on line 59 just use a simple break
.
PittAPI/news.py
Outdated
end_loop = False | ||
counter = 0 | ||
while not end_loop: | ||
url = 'https://m.pitt.edu/news/index.json?feed={}&id=&_object=kgoui_Rcontent_I0_Rcontent_I0&_object_include_html=1'.format(feed) + '&start=' + str(counter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass in the parameters using a payload dict like
payload = {
"feed": feed,
"id": "",
"_object": "kgoui_Rcontent_I0_Rcontent_I0",
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This essentially does the same thing but in a much more cleaner and readable way
payload = {
"feed": feed,
"id": "",
"_object": "kgoui_Rcontent_I0_Rcontent_I0",
"start": 0
}
data = sess.get('https://m.pitt.edu/news/index.json', params=payload).json()
...
payload["start"] += 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know that this was a thing. I'll give it a shot.
tests/news_test.py
Outdated
self.assertIsInstance(news.get_news("main_news"), dict) | ||
self.assertIsInstance(news.get_news("cssd"), dict) | ||
self.assertIsInstance(news.get_news("news_chronicle"), dict) | ||
self.assertIsInstance(news.get_news("news_alerts"), dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new line at the end of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually do this I don't know why it didn't happen.
PittAPI/news.py
Outdated
map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links) | ||
|
||
if any('Load more...' in s for s in news_names): | ||
news.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see what you're talking about. This should fix that news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links)))
. This should prevent overwriting the news
list while allowing new elements to be added.
Travis is complaining that |
Change the tests, they are all checking for |
Now I feel kind of bad from removing the humor, honestly bring it back if you want. Otherwise good news it's passing. 🎉 |
PittAPI/news.py
Outdated
#strainer = SoupStrainer('div', attrs={'class': 'kgoui_list_item_textblock'}) | ||
|
||
|
||
def get_news(feed="main_news"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a kwarg to say how many news items to get. Default should be 10.
news_names = map((lambda i: i.getText()), soup.find_all('span', class_='kgoui_list_item_title')) | ||
news_links = map(_href_to_url, soup.find_all('a', class_="kgoui_list_item_action")) | ||
|
||
news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use itertools.chain()
to merge generators together rather than forcing the generator to unroll to a list. Save the unrolling until the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
news
can just be a generator till your return it. Until the return part where you'd return list(news)
, use generator comprehensions instead of map for efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this generator that you speak of?
My Ruby brain can only handle maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an iterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then right before I return then I populate the news
array using the news_names
and news_links
generators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will news_names
not get overwritten when the loop runs again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could replace news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links)))
with something like news.extend([{'title': t, 'url': u} for (t,u) in zip(news_names, new_links)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, as I was saying before, you could make news
itself a generator and use itertools.chain()
to merge the two by replacing the list comprehension with a generator comprehension 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... I think I get the idea but not the practice.
I'll leave it to you.
tests/news_test.py
Outdated
self.assertIsInstance(news.get_news("main_news"), list) | ||
self.assertIsInstance(news.get_news("cssd"), list) | ||
self.assertIsInstance(news.get_news("news_chronicle"), list) | ||
self.assertIsInstance(news.get_news("news_alerts"), list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New line at the end of file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have like two new lines at the end of the file already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, they're not showing up for some reason. I'll take your word for it :P
…Also restored my sense of humor. And added tests for get_news with number limit
Looks like the Lord has accepted your offering 🎉 |
But I can't figure out how to get Python 3 to work with libraries and I don't know how to do testing with Python. (I have Python 3 but pip doesn't work and I can't install python3-pip because of dependency hell)
At least I got the News API to work in Python 2.7.