Skip to content

k-appears/TweeterTL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Purpose

Investigate different alternatives to get the TimeLine of a Tweeter account

Investigation using the official API

From Twitter documentation official libraries

  1. HorseBird Client consuming Twitter's Streaming API. Uses internally twitter4j to get timeline

Investigation using non official API

  1. Github library to use Advanced Search

    • PROS: No limitation of number of tweets
    • CONS: Not all the tweets are present. See documentation see screeshot see screeshot
  2. Github library to use timeline API https://twitter.com/i/search/timeline?&q=from:LetGo&f=tweets

    • After testing it, it is not parsing correctly timeline
  3. Custom library calling when using ScrollDown feature and scrapping HTML elements

    • PROS: No limitation in number of requests
    • CONS: Limitation of number from 800 to 900 tweets
  4. twitter4j is an unofficial Java library for the Twitter API.

Consideration

  • For this project to retrieve timeline of a given user, I used twitter4j library

    Get Timeline of user has restrictions:

    Response formats	JSON
    Requires authentication?	Yes
    Rate limited?	Yes
    Requests / 15-min window (user auth)	900
    Requests / 15-min window (app auth)	1500
    
  • Used auth key

    If you want to use your own Twitter authentication keys, set as JMV parameters:

    • OAuthConsumerSecret
    • OAuthConsumerKey
    • OAuthAccessToken
    • OAuthAccessTokenSecret as -DOAuthConsumerSecret=XXXXX -DOAuthConsumerKey=XXXXX -DOAuthAccessToken=XXXXX -DOAuthAccessTokenSecret=XXXXX

    Or change twitter.properties

  • In order to reduce requests to Twitter, Twitter provides Pagination feature

    The maximum in paging is 1000

    But when using twitter4j the maximum is 200, the documentation states it for performance reasons in the deprecated method getUserTimeline

  • MainServerTest could use more strict validation of the output. Json Schema Validator could have been used.

Restrictions in supported twitter API utilization

Rules and policy

Don’t!

  • Violate these or other policies.
  • Be extra mindful of our rules about abuse and user privacy.
  • Abuse the Twitter API or attempt to circumvent rate limits.
  • Use non-API-based forms of automation, such as scripting the Twitter website. The use of these techniques may result in the permanent suspension of your account.
Restrictions in timelines Twitter API

This method can only return up to 3,200 of a user’s most recent Tweets. Native retweets of other statuses by the user is included in this total, regardless of whether include_rts is set to false when requesting this resource.

Resource URL https://api.twitter.com/1.1/statuses/user_timeline.json

Resource Information

    Response formats	                    JSON
    Requires authentication?	            Yes
    Rate limited?                           Yes
    Requests / 15-min window (user auth)    900
    Requests / 15-min window (app auth)	    1500

Restrictions creating account

Because I did not provide a phone number: see screeshot

Solution Implementation details

  • The main() method is in MainServer.java which spawns a http server using exclusively classes inside the JDK, the reason is to tune up a policy for request overflow.

  • The operation to retrieve the tweets is idempotent therefore GET is used

  • If we want to create new endpoints we will create a new class implementing Handler interface and adding the logic to trigger in FactoryHandler

  • Tweets can be locale sensitive to identify the language in order to represent it, Locale class is used.

  • Since no information is stored, no need to monitor the memory, but if cache is implemented, overflow error can lead to Memory leaks

  • Handling exception: getUserTimeline throws a checked exception TwitterException and it is handled in MainServer#initizalizeContext() but it could throw an unchecked exception, this is the reason that } catch (Exception e) { is handled inside this method.

  • Use UTF-8 to decode chars, Emoti will be represented but not Japanese Kanji

  • If request is not GET with the correct parameters, the reason won't be shown

  • Used system.out.println when error as max memory reached or unchecked exception caught but in further steps use logging mechanism

Testing
  • The code was implemented using TDD, divided first into 2 modules creating the Server and creating the Request Handler and then joining using Integration Tests

  • Unit test can run without Internet connectivity, regression test need it

  • Test using Non official and not supported Search Advanced library fails erratically. See test AdvancedSearchAPITest.java searchNumberTweetsByUser_3201tweets()

    org.json.JSONException: JSONObject["min_position"] not a string. at org.json.JSONObject.getString(JSONObject.java:725) at me.jhenrique.manager.TweetManager.getTweets(TweetManager.java:81) at twitter.AdvancedSearchAPI.searchNumberTweetsByUser(AdvancedSearchAPI.java:39) at integration.twitter.AdvancedSearchAPITest.searchNumberTweetsByUser_3199tweets(AdvancedSearchAPITest.java:65) java.lang.AssertionError:

    Expected size:<3199> but was:<157> in: ...

  • Load testing using JMETER is a iteration of a simple request with same username and tweet_number inside the project

Strategy to use for implementing the cache

Uses cases when cache is invalidated:

  1. A user is disabled/blocked/removed
  2. A new Tweet is created
  3. A user removes a tweet

Possible ways to solve it:

  1. If invalid user: Use Twitter API to look up for a username, if it does not exists, evict the cache and return empty response.

  2. If new tweets created: Use implemented library to scrape API ScrollDown does not have request limit, to validate that the first 5 tweets are in the top of the cache:

    • If not included: Use the getUserTimeline and append the tweets to the cache
    • If included: Return the latest num_tweets in the cache
  3. If tweet is removed: Use Account activity Twitter API to subscribe to a user activity. Since the service that allows to get the activity for any user is a Enterprise service that requires a paid subscription In case we have access to an Enterprise Subscription API

    • If user not in cache: Add it to the cache and store the retrieve latest tweets from getUserTimeline Twitter API and the time of the latest activity. Then subscribe to the user activity
    • If user in cache:
      1. Check the activity from the subscription of the given user.
      2. If user has deleted tweet from the last
      3. Update time of the latest activity

    A free price alternative is to use Twitter Stream to retrieve deleted tweets, Stackoverflown answer but Streaming Twitter API is deprecated and the documentation is not available

Further improvements to reduce Twitter API calls

  • Use the Twitter API to obtain the API rate limit status before querying the timeline API
  • Using a proper logging mechanism to triage errors
  • The client can throttle its requests if the server adds more info when a request is not success:
    • Quota limit reached
    • Http request parameters not correct
    • User does not exist

Development environment

  • Java version: 8

About

How to inspect Twitter TimeLine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages