Added News API, tests, and documentation. #48

Rahi374 · 2017-02-14T04:56:18Z

But I can't figure out how to get Python 3 to work with libraries and I don't know how to do testing with Python. (I have Python 3 but pip doesn't work and I can't install python3-pip because of dependency hell)

At least I got the News API to work in Python 2.7.

azharichenko · 2017-02-14T08:39:09Z

@Rahi374 Don't worry about support Python 2.7 as we have officially dropped the at version. Also, you pretty much need python3-pip so you can get the correct python libraries. The regular pip on your system is probably just getting just Python 2.7 packages. What OS are you running on your computer?

azharichenko · 2017-02-14T08:43:15Z

PittAPI/news.py

+	news_links = map((lambda i: i['href']), soup.find_all('a', class_="kgoui_list_item_action"))
+	news_links = map((lambda i: re.sub(r"\+at\+.+edu", "", i)), news_links)
+	news_links = map((lambda i: i.replace("/news", "https://m.pitt.edu/news")), news_links)
+	news_links = map((lambda i: unicode(i, 'utf-8')), news_links)


The tests are giving off a TabError, which is simple to fix. You just need to replace the tabs in front of all the news_names and news_links with 8 spaces and then it will pass the tests and work correctly. Python doesn't particularly like if you mix tabs and spaces so it's giving an error to force you to choose one, in our case spaces.

azharichenko · 2017-02-14T08:46:07Z

PittAPI/news.py

+
+	map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)
+
+        if any(u'Load more...' in s for s in news_names):


On lines 44, 51, 55 go ahead and drop the u and Unicode. Python 3 takes care of Unicode for us, hence why we switched over to Python 3.

… strings

Rahi374 · 2017-02-14T14:28:19Z

@azharichenko I know we dropped support but that's the only Python that I have that works with pip untill I do a dist-upgrade.
I'm on Debian Jessie with broken dependencies; I think a dist-upgrade should fix it.

azharichenko · 2017-02-14T14:35:27Z

PittAPI/news.py

+        map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)
+
+        if any('Load more...' in s for s in news_names):
+            news.pop()


From the tests, news.pop() threw an IndexError saying that the list is empty. It looks like nothing gets appended to news. I'm trying to investigate what going on.

news = list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links)) this was my fix. Though if this is suppose to return a list instead of dict then the test needs to be changed. Once I changed that line and change desired type to be returned in the tests it gave me the pass for the test.

My sample program that prints out the news prints out all the news.

The problem with assigning the output of the map to the news list is that it'll overwrite what was already there.

And we need to loop the get request a few times just like the Dining API because that's how the fetch works.

Ok, I see what you're talking about. This should fix that news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))). This should prevent overwriting the news list while allowing new elements to be added.

That sounds perfect.

azharichenko · 2017-02-14T14:44:15Z

Umm, I have had the same broken dependencies issue a while ago, I think it solved using apt install -f.

The recommendation online is to run these commands.

sudo dpkg --configure -a

sudo apt-get install -f

Rahi374 · 2017-02-14T15:12:14Z

I tried all that many times; didn't work.

RitwikGupta · 2017-02-14T16:57:56Z

PittAPI/news.py

+    while not end_loop:
+        url = 'https://m.pitt.edu/news/index.json?feed={}&id=&_object=kgoui_Rcontent_I0_Rcontent_I0&_object_include_html=1'.format(feed) + '&start=' + str(counter)
+        data = sess.get(url).json()  # Should be UTF-8 by JSON standard
+        soup = BeautifulSoup(data['response']['html'], 'lxml') #, parse_only=strainer)


BeautifulSoup is really slow. Raw string hacking might be better.

What do you mean?
Like just directly extract the html value from the json without BeautifulSoup?

If it's all JSON then just use the internal JSON library which would be way faster. Link to docs

Wait never mind I do need BeautifulSoup since I'm processing the element of the JSON that contains only html.

RitwikGupta · 2017-02-14T16:58:34Z

PittAPI/news.py

+        news_links = map((lambda i: i.replace("/news", "https://m.pitt.edu/news")), news_links)
+        #news_links = map((lambda i: unicode(i, 'utf-8')), news_links)
+
+        map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)


This is really verbose and unreadable, don't hesitate to write something over multiple lines.

I wish Python allowed multiline lambdas.
I'll probably define a helper function and pass that into map.

RitwikGupta · 2017-02-14T16:59:25Z

PittAPI/news.py

+
+    end_loop = False
+    counter = 0
+    while not end_loop:


Not a fan of this while loop structure. It seems extremely hacky and there has to be a better way to do it than setting a sentinel.

The slightly better way of doing this is to get rid of the end_loop variable and use while True: for the loop and in the else on line 59 just use a simple break.

RitwikGupta · 2017-02-14T17:01:14Z

PittAPI/news.py

+    end_loop = False
+    counter = 0
+    while not end_loop:
+        url = 'https://m.pitt.edu/news/index.json?feed={}&id=&_object=kgoui_Rcontent_I0_Rcontent_I0&_object_include_html=1'.format(feed) + '&start=' + str(counter)


Pass in the parameters using a payload dict like

payload = { "feed": feed, "id": "", "_object": "kgoui_Rcontent_I0_Rcontent_I0", ... }

I don't understand.

This essentially does the same thing but in a much more cleaner and readable way

payload = { "feed": feed, "id": "", "_object": "kgoui_Rcontent_I0_Rcontent_I0", "start": 0 } data = sess.get('https://m.pitt.edu/news/index.json', params=payload).json() ... payload["start"] += 10

I did not know that this was a thing. I'll give it a shot.

RitwikGupta · 2017-02-14T17:01:40Z

tests/news_test.py

+        self.assertIsInstance(news.get_news("main_news"), dict)
+        self.assertIsInstance(news.get_news("cssd"), dict)
+        self.assertIsInstance(news.get_news("news_chronicle"), dict)
+        self.assertIsInstance(news.get_news("news_alerts"), dict)


Add a new line at the end of the file

I usually do this I don't know why it didn't happen.

azharichenko · 2017-02-14T17:16:58Z

PittAPI/news.py

+        map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)
+
+        if any('Load more...' in s for s in news_names):
+            news.pop()


Ok, I see what you're talking about. This should fix that news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))). This should prevent overwriting the news list while allowing new elements to be added.

Rahi374 · 2017-02-14T19:02:02Z

Travis is complaining that get_news doesn't return a dict; it's not supposed to.

azharichenko · 2017-02-14T19:07:56Z

Change the tests, they are all checking for dicts. Also change the not False to just True in the while loop

azharichenko · 2017-02-14T20:22:21Z

Now I feel kind of bad from removing the humor, honestly bring it back if you want. Otherwise good news it's passing. 🎉

RitwikGupta · 2017-02-14T20:41:31Z

PittAPI/news.py

+#strainer = SoupStrainer('div', attrs={'class': 'kgoui_list_item_textblock'})
+
+
+def get_news(feed="main_news"):


Add a kwarg to say how many news items to get. Default should be 10.

RitwikGupta · 2017-02-14T20:43:53Z

PittAPI/news.py

+        news_names = map((lambda i: i.getText()), soup.find_all('span', class_='kgoui_list_item_title'))
+        news_links = map(_href_to_url, soup.find_all('a', class_="kgoui_list_item_action"))
+
+        news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links)))


Use itertools.chain() to merge generators together rather than forcing the generator to unroll to a list. Save the unrolling until the end.

What do you mean?

news can just be a generator till your return it. Until the return part where you'd return list(news), use generator comprehensions instead of map for efficiency.

What is this generator that you speak of?

My Ruby brain can only handle maps.

It's an iterator

So then right before I return then I populate the news array using the news_names and news_links generators?

Will news_names not get overwritten when the loop runs again?

You could replace news.extend(list(map((lambda t, u: {'title': t, 'url': u}), news_names, news_links))) with something like news.extend([{'title': t, 'url': u} for (t,u) in zip(news_names, new_links)])

Or, as I was saying before, you could make news itself a generator and use itertools.chain() to merge the two by replacing the list comprehension with a generator comprehension 😉

Hm... I think I get the idea but not the practice.
I'll leave it to you.

RitwikGupta · 2017-02-14T20:44:29Z

tests/news_test.py

+        self.assertIsInstance(news.get_news("main_news"), list)
+        self.assertIsInstance(news.get_news("cssd"), list)
+        self.assertIsInstance(news.get_news("news_chronicle"), list)
+        self.assertIsInstance(news.get_news("news_alerts"), list)


New line at the end of file

I have like two new lines at the end of the file already.

Weird, they're not showing up for some reason. I'll take your word for it :P

…Also restored my sense of humor. And added tests for get_news with number limit

azharichenko · 2017-02-15T12:44:46Z

Looks like the Lord has accepted your offering 🎉

Added News API, tests, and documentation.

8328550

azharichenko requested changes Feb 14, 2017

View reviewed changes

Fixed indentation problems caused by vim and removed explicit unicode…

2e48964

… strings

azharichenko requested changes Feb 14, 2017

View reviewed changes

RitwikGupta self-requested a review February 14, 2017 16:56

RitwikGupta requested changes Feb 14, 2017

View reviewed changes

azharichenko requested changes Feb 14, 2017

View reviewed changes

Made code cleaner

12b55a1

Fixed news tests and removed my sense of humor

0f808a2

RitwikGupta requested changes Feb 14, 2017

View reviewed changes

Rahi374 added 5 commits February 15, 2017 02:23

Added parameter to get_news to limit number of news events returned. …

050f7bd

…Also restored my sense of humor. And added tests for get_news with number limit

Please pass the tests

d2309c8

Why won't these tests pass

4918ee7

If it's not tested it can't fail.

d092f57

O Lord of Tests, Please Accept My Offering

169648c

RitwikGupta approved these changes Feb 15, 2017

View reviewed changes

RitwikGupta merged commit fca127d into pittcsc:master Feb 15, 2017

azharichenko mentioned this pull request Feb 19, 2017

Improving documentation #34

Closed

7 tasks


		map((lambda t, u: news.append({'title': t, 'url': u})), news_names, news_links)

		if any(u'Load more...' in s for s in news_names):

		#strainer = SoupStrainer('div', attrs={'class': 'kgoui_list_item_textblock'})


		def get_news(feed="main_news"):

Added News API, tests, and documentation. #48

Added News API, tests, and documentation. #48

Conversation

Rahi374 commented Feb 14, 2017

azharichenko commented Feb 14, 2017 • edited

azharichenko Feb 14, 2017 • edited

Choose a reason for hiding this comment

azharichenko Feb 14, 2017 • edited

Choose a reason for hiding this comment

Rahi374 commented Feb 14, 2017

azharichenko Feb 14, 2017 • edited

Choose a reason for hiding this comment

azharichenko Feb 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azharichenko commented Feb 14, 2017 • edited

Rahi374 commented Feb 14, 2017

Choose a reason for hiding this comment

Rahi374 Feb 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azharichenko Feb 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azharichenko Feb 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rahi374 commented Feb 14, 2017

azharichenko commented Feb 14, 2017 • edited

azharichenko commented Feb 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rahi374 Feb 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azharichenko commented Feb 15, 2017

azharichenko commented Feb 14, 2017 •

edited

azharichenko Feb 14, 2017 •

edited

azharichenko Feb 14, 2017 •

edited

azharichenko Feb 14, 2017 •

edited

azharichenko Feb 14, 2017 •

edited

azharichenko commented Feb 14, 2017 •

edited

Rahi374 Feb 14, 2017 •

edited

azharichenko Feb 14, 2017 •

edited

azharichenko Feb 14, 2017 •

edited

azharichenko commented Feb 14, 2017 •

edited

azharichenko commented Feb 14, 2017 •

edited

Rahi374 Feb 15, 2017 •

edited