Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added News API, tests, and documentation. #48

Merged
merged 9 commits into from
Feb 15, 2017

Conversation

Rahi374
Copy link
Member

@Rahi374 Rahi374 commented Feb 14, 2017

But I can't figure out how to get Python 3 to work with libraries and I don't know how to do testing with Python. (I have Python 3 but pip doesn't work and I can't install python3-pip because of dependency hell)

At least I got the News API to work in Python 2.7.

@azharichenko
Copy link
Member

azharichenko commented Feb 14, 2017

@Rahi374 Don't worry about support Python 2.7 as we have officially dropped the at version. Also, you pretty much need python3-pip so you can get the correct python libraries. The regular pip on your system is probably just getting just Python 2.7 packages. What OS are you running on your computer?

PittAPI/news.py Outdated
news_links = map((lambda i: i['href']), soup.find_all('a', class_="kgoui_list_item_action"))
news_links = map((lambda i: re.sub(r"\+at\+.+edu", "", i)), news_links)
news_links = map((lambda i: i.replace("/news", "https://m.pitt.edu/news")), news_links)
news_links = map((lambda i: unicode(i, 'utf-8')), news_links)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are giving off a TabError, which is simple to fix. You just need to replace the tabs in front of all the news_names and news_links with 8 spaces and then it will pass the tests and work correctly. Python doesn't particularly like if you mix tabs and spaces so it's giving an error to force you to choose one, in our case spaces.

PittAPI/news.py Outdated

map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)

if any(u'Load more...' in s for s in news_names):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On lines 44, 51, 55 go ahead and drop the u and Unicode. Python 3 takes care of Unicode for us, hence why we switched over to Python 3.

@Rahi374
Copy link
Member Author

Rahi374 commented Feb 14, 2017

@azharichenko I know we dropped support but that's the only Python that I have that works with pip untill I do a dist-upgrade.
I'm on Debian Jessie with broken dependencies; I think a dist-upgrade should fix it.

PittAPI/news.py Outdated
map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)

if any('Load more...' in s for s in news_names):
news.pop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the tests, news.pop() threw an IndexError saying that the list is empty. It looks like nothing gets appended to news. I'm trying to investigate what going on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

news = list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links)) this was my fix. Though if this is suppose to return a list instead of dict then the test needs to be changed. Once I changed that line and change desired type to be returned in the tests it gave me the pass for the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My sample program that prints out the news prints out all the news.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with assigning the output of the map to the news list is that it'll overwrite what was already there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we need to loop the get request a few times just like the Dining API because that's how the fetch works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see what you're talking about. This should fix that news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))). This should prevent overwriting the news list while allowing new elements to be added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds perfect.

@azharichenko
Copy link
Member

azharichenko commented Feb 14, 2017

Umm, I have had the same broken dependencies issue a while ago, I think it solved using apt install -f.


The recommendation online is to run these commands.

sudo dpkg --configure -a

sudo apt-get install -f

@Rahi374
Copy link
Member Author

Rahi374 commented Feb 14, 2017

I tried all that many times; didn't work.

@RitwikGupta RitwikGupta self-requested a review February 14, 2017 16:56
PittAPI/news.py Outdated
while not end_loop:
url = 'https://m.pitt.edu/news/index.json?feed={}&id=&_object=kgoui_Rcontent_I0_Rcontent_I0&_object_include_html=1'.format(feed) + '&start=' + str(counter)
data = sess.get(url).json() # Should be UTF-8 by JSON standard
soup = BeautifulSoup(data['response']['html'], 'lxml') #, parse_only=strainer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BeautifulSoup is really slow. Raw string hacking might be better.

Copy link
Member Author

@Rahi374 Rahi374 Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?
Like just directly extract the html value from the json without BeautifulSoup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's all JSON then just use the internal JSON library which would be way faster. Link to docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait never mind I do need BeautifulSoup since I'm processing the element of the JSON that contains only html.

PittAPI/news.py Outdated
news_links = map((lambda i: i.replace("/news", "https://m.pitt.edu/news")), news_links)
#news_links = map((lambda i: unicode(i, 'utf-8')), news_links)

map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really verbose and unreadable, don't hesitate to write something over multiple lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish Python allowed multiline lambdas.
I'll probably define a helper function and pass that into map.

PittAPI/news.py Outdated

end_loop = False
counter = 0
while not end_loop:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this while loop structure. It seems extremely hacky and there has to be a better way to do it than setting a sentinel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slightly better way of doing this is to get rid of the end_loop variable and use while True: for the loop and in the else on line 59 just use a simple break.

PittAPI/news.py Outdated
end_loop = False
counter = 0
while not end_loop:
url = 'https://m.pitt.edu/news/index.json?feed={}&id=&_object=kgoui_Rcontent_I0_Rcontent_I0&_object_include_html=1'.format(feed) + '&start=' + str(counter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in the parameters using a payload dict like

payload = {
    "feed": feed,
    "id": "",
    "_object": "kgoui_Rcontent_I0_Rcontent_I0",
...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially does the same thing but in a much more cleaner and readable way

payload = {
    "feed": feed,
    "id": "",
    "_object": "kgoui_Rcontent_I0_Rcontent_I0",
    "start": 0
    }
    data = sess.get('https://m.pitt.edu/news/index.json', params=payload).json()
     ...
     payload["start"] += 10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know that this was a thing. I'll give it a shot.

self.assertIsInstance(news.get_news("main_news"), dict)
self.assertIsInstance(news.get_news("cssd"), dict)
self.assertIsInstance(news.get_news("news_chronicle"), dict)
self.assertIsInstance(news.get_news("news_alerts"), dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new line at the end of the file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually do this I don't know why it didn't happen.

PittAPI/news.py Outdated
map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)

if any('Load more...' in s for s in news_names):
news.pop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see what you're talking about. This should fix that news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))). This should prevent overwriting the news list while allowing new elements to be added.

@Rahi374
Copy link
Member Author

Rahi374 commented Feb 14, 2017

Travis is complaining that get_news doesn't return a dict; it's not supposed to.

@azharichenko
Copy link
Member

azharichenko commented Feb 14, 2017

Change the tests, they are all checking for dicts. Also change the not False to just True in the while loop

@azharichenko
Copy link
Member

azharichenko commented Feb 14, 2017

Now I feel kind of bad from removing the humor, honestly bring it back if you want. Otherwise good news it's passing. 🎉

PittAPI/news.py Outdated
#strainer = SoupStrainer('div', attrs={'class': 'kgoui_list_item_textblock'})


def get_news(feed="main_news"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a kwarg to say how many news items to get. Default should be 10.

news_names = map((lambda i: i.getText()), soup.find_all('span', class_='kgoui_list_item_title'))
news_links = map(_href_to_url, soup.find_all('a', class_="kgoui_list_item_action"))

news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use itertools.chain() to merge generators together rather than forcing the generator to unroll to a list. Save the unrolling until the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

news can just be a generator till your return it. Until the return part where you'd return list(news), use generator comprehensions instead of map for efficiency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this generator that you speak of?

My Ruby brain can only handle maps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an iterator

Copy link
Member Author

@Rahi374 Rahi374 Feb 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then right before I return then I populate the news array using the news_names and news_links generators?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will news_names not get overwritten when the loop runs again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could replace news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))) with something like news.extend([{'title': t, 'url': u} for (t,u) in zip(news_names, new_links)])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, as I was saying before, you could make news itself a generator and use itertools.chain() to merge the two by replacing the list comprehension with a generator comprehension 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I think I get the idea but not the practice.
I'll leave it to you.

self.assertIsInstance(news.get_news("main_news"), list)
self.assertIsInstance(news.get_news("cssd"), list)
self.assertIsInstance(news.get_news("news_chronicle"), list)
self.assertIsInstance(news.get_news("news_alerts"), list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line at the end of file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have like two new lines at the end of the file already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, they're not showing up for some reason. I'll take your word for it :P

@azharichenko
Copy link
Member

Looks like the Lord has accepted your offering 🎉

@RitwikGupta RitwikGupta merged commit fca127d into pittcsc:master Feb 15, 2017
@azharichenko azharichenko mentioned this pull request Feb 19, 2017
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants