New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract complete video URLs for Tweets #1206

Merged
merged 1 commit into from Jun 1, 2017

Conversation

Projects
None yet
5 participants
@singhpratyush
Member

singhpratyush commented May 30, 2017

Short description

Solves #1205.

Related #1171, #1193.

This implementation mimics the video playback flow of mobile react app of Twitter.

  1. Extract BEARER_TOKEN holding script's URL.
  2. Extract guest session token.
  3. Extract BEARER_TOKEN from URL in 1.
  4. Make Twitter API call with the parameters.

Example -

$ http 'http://127.0.0.1:9000/api/search.json?q=dance+video&source=twitter'
HTTP/1.1 200 OK
Content-Length: 28907
Content-Type: application/json;charset=utf-8
Date: Tue, 30 May 2017 19:40:34 GMT
Expires: Tue, 30 May 2017 19:40:36 GMT
Last-Modified: Tue, 30 May 2017 19:40:34 GMT
Server: Jetty(9.3.z-SNAPSHOT)
Set-Cookie: JSESSIONID=m99m3a8oqnpkkzi9ns2xucz8;Path=/
X-Robots-Tag: noindex,noarchive,nofollow,nosnippet

{
    "readme_0": "THIS JSON IS THE RESULT OF YOUR SEARCH QUERY - THERE IS NO WEB PAGE WHICH SHOWS THE RESULT!",
    "readme_1": "loklak.org is the framework for a message search system, not the portal, read: http://loklak.org/about.html#notasearchportal",
    "readme_2": "This is supposed to be the back-end of a search portal. For the api, see http://loklak.org/api.html",
    "readme_3": "Parameters q=(query), source=(cache|backend|twitter|all), callback=p for jsonp, maximumRecords=(message count), minified=(true|false)",
    "search_metadata": {
        "cache_hits": 0,
        "client": "127.0.0.1",
        "count": "15",
        "count_backend": 0,
        "count_twitter_all": 0,
        "count_twitter_new": 15,
        "filter": "",
        "hits": 15,
        "maximumRecords": "20",
        "period": 7429,
        "query": "dance video",
        "scraperInfo": "local",
        "servicereduction": "false",
        "startRecord": "1",
        "time": 29043
    },
    "statuses": [
        ...
        {
            "audio": [],
            "audio_count": 0,
            "canonical_id": "",
            "classifier_language": "english",
            "classifier_language_probability": 4.53604936798195e-16,
            "created_at": "2017-05-30T19:39:19.000Z",
            "favourites_count": 0,
            "hashtags": [
                "dedication"
            ],
            "hashtags_count": 1,
            "hosts": [
                "pic.twitter.com"
            ],
            "hosts_count": 1,
            "id_str": "869639376531324928",
            "images": [
                "https://pbs.twimg.com/ext_tw_video_thumb/869638871658749954/pu/img/5xW6eHiTJe3x3Z8h.jpg",
                "https://pic.twitter.com/idrtzwp3Nw"
            ],
            "images_count": 2,
            "link": "https://twitter.com/annamorgandance/status/869639376531324928",
            "links": [
                "https://pic.twitter.com/idrtzwp3Nw"
            ],
            "links_count": 1,
            "mentions": [],
            "mentions_count": 0,
            "parent": "",
            "place_context": "ABOUT",
            "place_id": "",
            "place_name": "",
            "provider_type": "SCRAPED",
            "retweet_count": 0,
            "screen_name": "annamorgandance",
            "source_type": "TWITTER",
            "text": "My beaut DDE students Hannah & Carrie who can't resist a quick plié even when at Harry Potter World! #dedication https://pic.twitter.com/idrtzwp3Nw",
            "text_length": 147,
            "timestamp": "2017-05-30T19:40:57.207Z",
            "unshorten": {},
            "user": {
                "appearance_first": "2017-05-30T19:40:57.207Z",
                "appearance_latest": "2017-05-30T19:40:57.207Z",
                "name": "Anna Morgan Dance",
                "profile_image_url_https": "https://pbs.twimg.com/profile_images/868530941580513280/4lC0yEQz_bigger.jpg",
                "screen_name": "annamorgandance",
                "user_id": "745614723148877824"
            },
            "videos": [
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/vid/480x480/x2lkY3u7inN7g1wD.mp4",
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/vid/240x240/ANoa8-sHOAm0WJ7S.mp4",
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/vid/720x720/4W2O7kbxklEZBKyp.mp4",
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/pl/2l0ByK24IQ2qFM2o.m3u8"
            ],
            "videos_count": 4,
            "without_l_len": 112,
            "without_lu_len": 112,
            "without_luh_len": 100
        },
        ...
        {
            "audio": [],
            "audio_count": 0,
            "canonical_id": "",
            "classifier_language": "english",
            "classifier_language_probability": 6.234241689372219e-15,
            "created_at": "2017-05-30T19:38:39.000Z",
            "favourites_count": 0,
            "hashtags": [],
            "hashtags_count": 0,
            "hosts": [
                "pic.twitter.com"
            ],
            "hosts_count": 1,
            "id_str": "869639212047454210",
            "images": [
                "https://pbs.twimg.com/ext_tw_video_thumb/869639124390625280/pu/img/pJQLGxxK5FMHDRxS.jpg",
                "https://pic.twitter.com/jzgfD4cHos"
            ],
            "images_count": 2,
            "link": "https://twitter.com/prettylittlerem/status/869639212047454210",
            "links": [
                "https://pic.twitter.com/jzgfD4cHos"
            ],
            "links_count": 1,
            "mentions": [
                "missremiashten",
                "breetheunikitty",
                "Elias824",
                "millselle"
            ],
            "mentions_count": 4,
            "parent": "",
            "place_context": "ABOUT",
            "place_id": "",
            "place_name": "",
            "provider_type": "SCRAPED",
            "retweet_count": 0,
            "screen_name": "prettylittlerem",
            "source_type": "TWITTER",
            "text": "THIS DANCE VID WAS AMAZING!!!! TAKE THAT ELLE! @missremiashten @breetheunikitty @Elias824 @millselle https://pic.twitter.com/jzgfD4cHos",
            "text_length": 135,
            "timestamp": "2017-05-30T19:41:02.825Z",
            "unshorten": {},
            "user": {
                "appearance_first": "2017-05-30T19:41:02.825Z",
                "appearance_latest": "2017-05-30T19:41:02.825Z",
                "name": "MISS REMI ASHTEN",
                "profile_image_url_https": "https://pbs.twimg.com/profile_images/848365216027078656/676UcZuR_bigger.jpg",
                "screen_name": "prettylittlerem",
                "user_id": "843235916093231106"
            },
            "videos": [
                "https://video.twimg.com/ext_tw_video/869639124390625280/pu/vid/240x240/zp2c3364FDfSRej7.mp4",
                "https://video.twimg.com/ext_tw_video/869639124390625280/pu/vid/480x480/W7hByhvogFvZlTfZ.mp4",
                "https://video.twimg.com/ext_tw_video/869639124390625280/pu/pl/vg2UXk7rssHUq0Ga.m3u8"
            ],
            "videos_count": 3,
            "without_l_len": 100,
            "without_lu_len": 46,
            "without_luh_len": 46
        },
        ...

I have:

  • There is a corresponding issue for this pull request.
  • Mentioned the Issue number in the pull request commit message Fixes #<number> commit message
  • There is only strictly only one commit per issue.

For the reviewers

I have:

  • Reviewed this pull request by an authorized contributor.
  • The reviewer is assigned to the pull request.
@hemantjadon

This comment has been minimized.

Show comment
Hide comment
@hemantjadon

hemantjadon May 31, 2017

Hi, @singhpratyush I don't know why this seems to have modified the whole file can you please check?

hemantjadon commented May 31, 2017

Hi, @singhpratyush I don't know why this seems to have modified the whole file can you please check?

@vibhcool

This comment has been minimized.

Show comment
Hide comment
@vibhcool

vibhcool May 31, 2017

Member

@singhpratyush , could you add part of output in gist, and share it's link instead of complete output,
btw good job 👍

Member

vibhcool commented May 31, 2017

@singhpratyush , could you add part of output in gist, and share it's link instead of complete output,
btw good job 👍

@singhpratyush

This comment has been minimized.

Show comment
Hide comment
@singhpratyush

singhpratyush May 31, 2017

Member

@vibhcool: Done! Thanks.

@hemantjadon: I'm not sure but I guess it's about the line breaks. In the earlier version, it was CRLF and I converted them to LF.

Member

singhpratyush commented May 31, 2017

@vibhcool: Done! Thanks.

@hemantjadon: I'm not sure but I guess it's about the line breaks. In the earlier version, it was CRLF and I converted them to LF.

@hemantjadon

This comment has been minimized.

Show comment
Hide comment
@hemantjadon

hemantjadon May 31, 2017

@singhpratyush Yeah that maybe the case. We can create a .editorconfig the file which will handle such cases, so that such thing doesn't happen and uniformity is maintained throughout with all devs?

hemantjadon commented May 31, 2017

@singhpratyush Yeah that maybe the case. We can create a .editorconfig the file which will handle such cases, so that such thing doesn't happen and uniformity is maintained throughout with all devs?

@singhpratyush

This comment has been minimized.

Show comment
Hide comment
@singhpratyush

singhpratyush May 31, 2017

Member

@hemantjadon: That would be good, but should go to another PR I guess.

Member

singhpratyush commented May 31, 2017

@hemantjadon: That would be good, but should go to another PR I guess.

@vibhcool

This comment has been minimized.

Show comment
Hide comment
@vibhcool

vibhcool May 31, 2017

Member

@singhpratyush , it is difficult to review complete module of 835 lines , please fix the it. 😅

Member

vibhcool commented May 31, 2017

@singhpratyush , it is difficult to review complete module of 835 lines , please fix the it. 😅

Fix #1205: Extract complete video URLs for Tweets
    This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.
@singhpratyush

This comment has been minimized.

Show comment
Hide comment
Member

singhpratyush commented May 31, 2017

@vibhcool

LGTM

@mariobehling mariobehling merged commit 7c513ae into loklak:development Jun 1, 2017

2 checks passed

codacy/pr Good work! A positive pull request.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@singhpratyush singhpratyush deleted the singhpratyush:1171 branch Jun 15, 2017

mariobehling added a commit that referenced this pull request Jul 5, 2017

Update Master with Development branch (#1281)
* deploy button info for docker #1001

* Fixes #1045 : Replace the image logo in navigation bar with a text


Fixes #1045 : Replace the image logo in navigation bar with a text

* Fixes #1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue #1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder #1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue #1049.
The pull request #1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes #1060: Increase default Xmx value

* Fixed #1059 - Remove and Ignore .DS_Store

* Fixes #1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See #1042

* README.md upd, useful links added

* Fixes #1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes #1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to #1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to #1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes #1003

* Creating Volume for persistence while deploying via docker, fix #1051 (#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes #1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue #919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes #1099 : Changes the href link of the button download, install and extend

* fix #1056 - document how to start contributing (#1063)

* Added JS EventListener to resize dump iframe on load. Closes #1101

* Add Unit Tests to Loklak Server (#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes #1103: Changed the URLs to the correct ones (#1104)

* Fixes #1103: Changed the URLs to the correct ones

* Fixes #1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes #961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (#1097)

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related #1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related #1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix #1138: Correct spelling mistake in README.md (#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button in rst file (#1137)

* Related #1070: Fix Codacy issues for files in org.loklak.api.search (#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes #1139: Changed the URL (#1141)

* Fixes #1139: Changed the URL

* Fixes #1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes #1070:Strings must use double quotes, no-use-before-define

* Related to #1070:Strings must use double quotes, no-use-before-define

* Related #1058: Add Kaizen harvester usage documentation (#1145)

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define (#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* fix #1130: Make retries and back off parameter for backend push configurable (#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of #1132: Add unit test to check TwitterScraper output (#1133)

* convert markdown file to rst (#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related #1070: Improve code quality for org.loklak.api.admin.* (#1149)

* Related to #1070: Improve code quality for org.loklak.Crawler.java

* fix related to #1152: code refractoring for logging (#1153)

* fix related to #1133: fix access specifiers (#1151)

* fixes #1161: Add GCloud Kubernetes deployment document for loklak (#1162)

* fixes #1146: Check for TwitterFactory before getting instance (#1147)

* Related #1070: Fix Codacy issues for org.loklak.api.amazon.* (#1163)

* fix #1143: Fix NumberException in YoutubeScraper (#1157)

* Installation and Start on a user specified port (#1159)

Solves issue: #925

* Fixes #1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to #1112:Add filter for images, videos (#1164)

* Related #1156: Make harvesting decision biased for Kaizen (#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes #1167 GithubScraperService able to scrape user specific data (#1168)

Fixes issue #1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes #1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: #1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes #1177 - Added tests for WordpressCrawlerService.java

fixes issue #1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix #1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes #1184 - Instagram Profile Scraper is now working

fixes issue #1184. Instagram scraper is now returning data.

* fix #1179: Use java.net.URL to build relative URL in ClientConnection (#1183)

* fixes #1070: Add test for URL unshortening (#1173)

* fixes #1169 - Added test for Github profile scraper (#1185)

fixes issue #1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (#1187)

* Related #1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes #1191: NullPointerException in CareTaker.java (#1192)

* Auto-generate docs in dev.loklak.org repository (#1195)

* Fix #1171: Extract video URLs from IFrame (#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx #1201: Break down KaizenHarvester into simpler pieces (#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix #1208: Add .editorconfig (#1209)

* Fixes #1204 Add subtree if not already added (#1207)

* Fix #1205: Extract complete video URLs for Tweets (#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes #1196 - Enhanced Quora profile scraper #1199 (#1200)

Fixes issue #1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix #1188: Use unbescape to unescape HTML in html2utf8 (#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes #1097: Restore access specifiers in TwitterScraper.java (#1198)

* Fix indentation (#1211)

* Fix #1212: fix checkstyle errors(except missing javadoc) (#1218)

* Fixes #1215 fix syntax error in the script (#1217)

* Fix #1213: Include videos for testing TwitterScraper (#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (#1159)" (#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes #1202: Modify loggers in Loklak Server for testing (#1222)

* Fixes #1219: Add UTC time in TimeAndDateService (#1220)

* Fixes #1112: Add image, video filter constraints for cache (#1190)

* Fixes #1236: Update Docs for get parameter (#1237)

* Fixes #1226 Build error currently showing (#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix #1239: Correct flag values in config.properties

* Fix #1238: Add PriorityQueue harvesting strategy (#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix #1251: Correct test case for RedirectUnshortener (#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix #1247: Add function to collect stats about all classes for a classifier (#1248)

* Fix #1256: Add classifier.json endpoint to serve aggregated data (#1257)

* refactoring to have the same naming as in susi_server

* Fixes #1261: RedirectUnshortener link fix (#1262)

* Fixes #1229, #1235, Related #1230: Setup of testable version (#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix #1259: Add function for time sensitive aggregation (#1260)

* Fix #1271: Correct redirect link in test (#1272)

* Fix #1266: Allow time based aggregation in /api/classifier.json (#1267)

* Fix #1278: Correct typo in kaizen.md (#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix #1268: Add function for aggregation based on country codes (#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix #1273: Add Jacoco to provide coverage report in XML format (#1274)

* Fixes 1284: Improve test cases for URL unshortener (#1285)

* Setup post and basescraper with QuoraProfileScraper (#1249)

* Setup of testable version

setup post and basescraper

* Related #1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push

vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 6, 2017

Update Master with Development branch (#1281)
* deploy button info for docker #1001

* Fixes #1045 : Replace the image logo in navigation bar with a text

Fixes #1045 : Replace the image logo in navigation bar with a text

* Fixes #1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue #1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder #1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue #1049.
The pull request #1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes #1060: Increase default Xmx value

* Fixed #1059 - Remove and Ignore .DS_Store

* Fixes #1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See #1042

* README.md upd, useful links added

* Fixes #1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes loklak#1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to loklak#1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to loklak#1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes loklak#1003

* Creating Volume for persistence while deploying via docker, fix #1051 (#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes loklak#1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue loklak#919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes #1099 : Changes the href link of the button download, install and extend

* fix #1056 - document how to start contributing (#1063)

* Added JS EventListener to resize dump iframe on load. Closes #1101

* Add Unit Tests to Loklak Server (#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes #1103: Changed the URLs to the correct ones (#1104)

* Fixes #1103: Changed the URLs to the correct ones

* Fixes #1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes #961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (#1097)

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related #1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related #1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix #1138: Correct spelling mistake in README.md (#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button in rst file (#1137)

* Related #1070: Fix Codacy issues for files in org.loklak.api.search (#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes #1139: Changed the URL (#1141)

* Fixes #1139: Changed the URL

* Fixes #1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes #1070:Strings must use double quotes, no-use-before-define

* Related to #1070:Strings must use double quotes, no-use-before-define

* Related #1058: Add Kaizen harvester usage documentation (#1145)

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define (#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* fix #1130: Make retries and back off parameter for backend push configurable (#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of #1132: Add unit test to check TwitterScraper output (#1133)

* convert markdown file to rst (#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related #1070: Improve code quality for org.loklak.api.admin.* (#1149)

* Related to #1070: Improve code quality for org.loklak.Crawler.java

* fix related to #1152: code refractoring for logging (#1153)

* fix related to #1133: fix access specifiers (#1151)

* fixes #1161: Add GCloud Kubernetes deployment document for loklak (#1162)

* fixes #1146: Check for TwitterFactory before getting instance (#1147)

* Related #1070: Fix Codacy issues for org.loklak.api.amazon.* (#1163)

* fix #1143: Fix NumberException in YoutubeScraper (#1157)

* Installation and Start on a user specified port (#1159)

Solves issue: loklak#925

* Fixes #1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to #1112:Add filter for images, videos (#1164)

* Related #1156: Make harvesting decision biased for Kaizen (#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes #1167 GithubScraperService able to scrape user specific data (#1168)

Fixes issue #1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes #1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: loklak#1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes #1177 - Added tests for WordpressCrawlerService.java

fixes issue #1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix #1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes #1184 - Instagram Profile Scraper is now working

fixes issue #1184. Instagram scraper is now returning data.

* fix #1179: Use java.net.URL to build relative URL in ClientConnection (#1183)

* fixes #1070: Add test for URL unshortening (#1173)

* fixes #1169 - Added test for Github profile scraper (#1185)

fixes issue #1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (#1187)

* Related #1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes #1191: NullPointerException in CareTaker.java (#1192)

* Auto-generate docs in dev.loklak.org repository (#1195)

* Fix #1171: Extract video URLs from IFrame (#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx #1201: Break down KaizenHarvester into simpler pieces (#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix #1208: Add .editorconfig (#1209)

* Fixes #1204 Add subtree if not already added (#1207)

* Fix #1205: Extract complete video URLs for Tweets (#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes #1196 - Enhanced Quora profile scraper #1199 (#1200)

Fixes issue #1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix #1188: Use unbescape to unescape HTML in html2utf8 (#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes #1097: Restore access specifiers in TwitterScraper.java (#1198)

* Fix indentation (#1211)

* Fix #1212: fix checkstyle errors(except missing javadoc) (#1218)

* Fixes #1215 fix syntax error in the script (#1217)

* Fix #1213: Include videos for testing TwitterScraper (#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (#1159)" (#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes #1202: Modify loggers in Loklak Server for testing (#1222)

* Fixes #1219: Add UTC time in TimeAndDateService (#1220)

* Fixes #1112: Add image, video filter constraints for cache (#1190)

* Fixes #1236: Update Docs for get parameter (#1237)

* Fixes #1226 Build error currently showing (#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix #1239: Correct flag values in config.properties

* Fix #1238: Add PriorityQueue harvesting strategy (#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix #1251: Correct test case for RedirectUnshortener (#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix #1247: Add function to collect stats about all classes for a classifier (#1248)

* Fix #1256: Add classifier.json endpoint to serve aggregated data (#1257)

* refactoring to have the same naming as in susi_server

* Fixes #1261: RedirectUnshortener link fix (#1262)

* Fixes #1229, #1235, Related #1230: Setup of testable version (#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix #1259: Add function for time sensitive aggregation (#1260)

* Fix #1271: Correct redirect link in test (#1272)

* Fix #1266: Allow time based aggregation in /api/classifier.json (#1267)

* Fix #1278: Correct typo in kaizen.md (#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix #1268: Add function for aggregation based on country codes (#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix #1273: Add Jacoco to provide coverage report in XML format (#1274)

* Fixes 1284: Improve test cases for URL unshortener (#1285)

* Setup post and basescraper with QuoraProfileScraper (#1249)

* Setup of testable version

setup post and basescraper

* Related #1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push

vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 6, 2017

Update Master with Development branch (#1281)
* deploy button info for docker #1001

* Fixes #1045 : Replace the image logo in navigation bar with a text

Fixes #1045 : Replace the image logo in navigation bar with a text

* Fixes #1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue #1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder #1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue #1049.
The pull request #1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes #1060: Increase default Xmx value

* Fixed #1059 - Remove and Ignore .DS_Store

* Fixes #1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See #1042

* README.md upd, useful links added

* Fixes #1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes loklak#1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to loklak#1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to loklak#1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes loklak#1003

* Creating Volume for persistence while deploying via docker, fix #1051 (#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes loklak#1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue loklak#919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes #1099 : Changes the href link of the button download, install and extend

* fix #1056 - document how to start contributing (#1063)

* Added JS EventListener to resize dump iframe on load. Closes #1101

* Add Unit Tests to Loklak Server (#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes #1103: Changed the URLs to the correct ones (#1104)

* Fixes #1103: Changed the URLs to the correct ones

* Fixes #1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes #961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (#1097)

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related #1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related #1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix #1138: Correct spelling mistake in README.md (#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button in rst file (#1137)

* Related #1070: Fix Codacy issues for files in org.loklak.api.search (#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes #1139: Changed the URL (#1141)

* Fixes #1139: Changed the URL

* Fixes #1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes #1070:Strings must use double quotes, no-use-before-define

* Related to #1070:Strings must use double quotes, no-use-before-define

* Related #1058: Add Kaizen harvester usage documentation (#1145)

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define (#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* fix #1130: Make retries and back off parameter for backend push configurable (#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of #1132: Add unit test to check TwitterScraper output (#1133)

* convert markdown file to rst (#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related #1070: Improve code quality for org.loklak.api.admin.* (#1149)

* Related to #1070: Improve code quality for org.loklak.Crawler.java

* fix related to #1152: code refractoring for logging (#1153)

* fix related to #1133: fix access specifiers (#1151)

* fixes #1161: Add GCloud Kubernetes deployment document for loklak (#1162)

* fixes #1146: Check for TwitterFactory before getting instance (#1147)

* Related #1070: Fix Codacy issues for org.loklak.api.amazon.* (#1163)

* fix #1143: Fix NumberException in YoutubeScraper (#1157)

* Installation and Start on a user specified port (#1159)

Solves issue: loklak#925

* Fixes #1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to #1112:Add filter for images, videos (#1164)

* Related #1156: Make harvesting decision biased for Kaizen (#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes #1167 GithubScraperService able to scrape user specific data (#1168)

Fixes issue #1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes #1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: loklak#1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes #1177 - Added tests for WordpressCrawlerService.java

fixes issue #1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix #1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes #1184 - Instagram Profile Scraper is now working

fixes issue #1184. Instagram scraper is now returning data.

* fix #1179: Use java.net.URL to build relative URL in ClientConnection (#1183)

* fixes #1070: Add test for URL unshortening (#1173)

* fixes #1169 - Added test for Github profile scraper (#1185)

fixes issue #1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (#1187)

* Related #1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes #1191: NullPointerException in CareTaker.java (#1192)

* Auto-generate docs in dev.loklak.org repository (#1195)

* Fix #1171: Extract video URLs from IFrame (#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx #1201: Break down KaizenHarvester into simpler pieces (#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix #1208: Add .editorconfig (#1209)

* Fixes #1204 Add subtree if not already added (#1207)

* Fix #1205: Extract complete video URLs for Tweets (#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes #1196 - Enhanced Quora profile scraper #1199 (#1200)

Fixes issue #1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix #1188: Use unbescape to unescape HTML in html2utf8 (#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes #1097: Restore access specifiers in TwitterScraper.java (#1198)

* Fix indentation (#1211)

* Fix #1212: fix checkstyle errors(except missing javadoc) (#1218)

* Fixes #1215 fix syntax error in the script (#1217)

* Fix #1213: Include videos for testing TwitterScraper (#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (#1159)" (#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes #1202: Modify loggers in Loklak Server for testing (#1222)

* Fixes #1219: Add UTC time in TimeAndDateService (#1220)

* Fixes #1112: Add image, video filter constraints for cache (#1190)

* Fixes #1236: Update Docs for get parameter (#1237)

* Fixes #1226 Build error currently showing (#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix #1239: Correct flag values in config.properties

* Fix #1238: Add PriorityQueue harvesting strategy (#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix #1251: Correct test case for RedirectUnshortener (#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix #1247: Add function to collect stats about all classes for a classifier (#1248)

* Fix #1256: Add classifier.json endpoint to serve aggregated data (#1257)

* refactoring to have the same naming as in susi_server

* Fixes #1261: RedirectUnshortener link fix (#1262)

* Fixes #1229, #1235, Related #1230: Setup of testable version (#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix #1259: Add function for time sensitive aggregation (#1260)

* Fix #1271: Correct redirect link in test (#1272)

* Fix #1266: Allow time based aggregation in /api/classifier.json (#1267)

* Fix #1278: Correct typo in kaizen.md (#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix #1268: Add function for aggregation based on country codes (#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix #1273: Add Jacoco to provide coverage report in XML format (#1274)

* Fixes 1284: Improve test cases for URL unshortener (#1285)

* Setup post and basescraper with QuoraProfileScraper (#1249)

* Setup of testable version

setup post and basescraper

* Related #1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push

vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 15, 2017

Fix #1205: Extract complete video URLs for Tweets (#1206)
This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment