Fix issue #93 - uses re to parse links in tweet #102

pravj · 2015-04-14T16:26:47Z

Before this, the open {id} command was using only prefix-matching to parse links from tweets.
In old way, it used to split the tweet text and collected all the words with http/https as prefix.

As explained in a sample tweet in issue #93:

Super excited to have David DeSandro (@desandro) helping us revamp forecast.io (soon to be http://darksky.net).

The parsed links here, will be 'http://darksky.net).' instead of 'http://darksky.net', which is wrong.

Now it uses regex module to parse the same.

As shortened URLs in tweets have 22/23 string length(based on protocol). So, all the short URLs have unique-id of length 10 as suffix.

So, a simple re.findall with https?://t.co/[a-zA-Z0-9]{10} as search pattern, will return a list of all the links in the tweet text.

That's it.

> Initially the `open {id}` command was using only prefix-matching to > parse links from tweets. > Now it uses regex module to parse the same. > As shortened URLs in tweets have 22/23 string length(based on > protocol). So, all the short URLs have unique-id of length 10 as > suffix.

orakaro · 2015-04-16T02:39:14Z

Hi @pravj ! Thanks for suggest this. I will test for a while before merge 🍺

Tenzer · 2015-05-26T07:59:44Z

Wouldn't it be simpler and more reliable to just use the entities list Twitter includes with every tweet? https://dev.twitter.com/overview/api/entities-in-twitter-objects#urls

orakaro · 2015-07-31T01:41:25Z

Thanks @Tenzer .I will take a further look for the entities list

Tenzer · 2015-07-31T06:17:29Z

@dtvd You can take a look at the change to repall I made, that was switched to use entities: #90.

orakaro · 2015-07-31T06:21:59Z

@Tenzer thanks. Just update this function using entities

orakaro closed this Jul 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue #93 - uses re to parse links in tweet #102

Fix issue #93 - uses re to parse links in tweet #102

pravj commented Apr 14, 2015

orakaro commented Apr 16, 2015

Tenzer commented May 26, 2015

orakaro commented Jul 31, 2015

Tenzer commented Jul 31, 2015

orakaro commented Jul 31, 2015

Fix issue #93 - uses re to parse links in tweet #102

Fix issue #93 - uses re to parse links in tweet #102

Conversation

pravj commented Apr 14, 2015

orakaro commented Apr 16, 2015

Tenzer commented May 26, 2015

orakaro commented Jul 31, 2015

Tenzer commented Jul 31, 2015

orakaro commented Jul 31, 2015