Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract complete video URLs for Tweets #1206

Merged
merged 1 commit into from Jun 1, 2017

Conversation

@singhpratyush
Copy link
Member

singhpratyush commented May 30, 2017

Short description

Solves #1205.

Related #1171, #1193.

This implementation mimics the video playback flow of mobile react app of Twitter.

  1. Extract BEARER_TOKEN holding script's URL.
  2. Extract guest session token.
  3. Extract BEARER_TOKEN from URL in 1.
  4. Make Twitter API call with the parameters.

Example -

$ http 'http://127.0.0.1:9000/api/search.json?q=dance+video&source=twitter'
HTTP/1.1 200 OK
Content-Length: 28907
Content-Type: application/json;charset=utf-8
Date: Tue, 30 May 2017 19:40:34 GMT
Expires: Tue, 30 May 2017 19:40:36 GMT
Last-Modified: Tue, 30 May 2017 19:40:34 GMT
Server: Jetty(9.3.z-SNAPSHOT)
Set-Cookie: JSESSIONID=m99m3a8oqnpkkzi9ns2xucz8;Path=/
X-Robots-Tag: noindex,noarchive,nofollow,nosnippet

{
    "readme_0": "THIS JSON IS THE RESULT OF YOUR SEARCH QUERY - THERE IS NO WEB PAGE WHICH SHOWS THE RESULT!",
    "readme_1": "loklak.org is the framework for a message search system, not the portal, read: http://loklak.org/about.html#notasearchportal",
    "readme_2": "This is supposed to be the back-end of a search portal. For the api, see http://loklak.org/api.html",
    "readme_3": "Parameters q=(query), source=(cache|backend|twitter|all), callback=p for jsonp, maximumRecords=(message count), minified=(true|false)",
    "search_metadata": {
        "cache_hits": 0,
        "client": "127.0.0.1",
        "count": "15",
        "count_backend": 0,
        "count_twitter_all": 0,
        "count_twitter_new": 15,
        "filter": "",
        "hits": 15,
        "maximumRecords": "20",
        "period": 7429,
        "query": "dance video",
        "scraperInfo": "local",
        "servicereduction": "false",
        "startRecord": "1",
        "time": 29043
    },
    "statuses": [
        ...
        {
            "audio": [],
            "audio_count": 0,
            "canonical_id": "",
            "classifier_language": "english",
            "classifier_language_probability": 4.53604936798195e-16,
            "created_at": "2017-05-30T19:39:19.000Z",
            "favourites_count": 0,
            "hashtags": [
                "dedication"
            ],
            "hashtags_count": 1,
            "hosts": [
                "pic.twitter.com"
            ],
            "hosts_count": 1,
            "id_str": "869639376531324928",
            "images": [
                "https://pbs.twimg.com/ext_tw_video_thumb/869638871658749954/pu/img/5xW6eHiTJe3x3Z8h.jpg",
                "https://pic.twitter.com/idrtzwp3Nw"
            ],
            "images_count": 2,
            "link": "https://twitter.com/annamorgandance/status/869639376531324928",
            "links": [
                "https://pic.twitter.com/idrtzwp3Nw"
            ],
            "links_count": 1,
            "mentions": [],
            "mentions_count": 0,
            "parent": "",
            "place_context": "ABOUT",
            "place_id": "",
            "place_name": "",
            "provider_type": "SCRAPED",
            "retweet_count": 0,
            "screen_name": "annamorgandance",
            "source_type": "TWITTER",
            "text": "My beaut DDE students Hannah & Carrie who can't resist a quick plié even when at Harry Potter World! #dedication https://pic.twitter.com/idrtzwp3Nw",
            "text_length": 147,
            "timestamp": "2017-05-30T19:40:57.207Z",
            "unshorten": {},
            "user": {
                "appearance_first": "2017-05-30T19:40:57.207Z",
                "appearance_latest": "2017-05-30T19:40:57.207Z",
                "name": "Anna Morgan Dance",
                "profile_image_url_https": "https://pbs.twimg.com/profile_images/868530941580513280/4lC0yEQz_bigger.jpg",
                "screen_name": "annamorgandance",
                "user_id": "745614723148877824"
            },
            "videos": [
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/vid/480x480/x2lkY3u7inN7g1wD.mp4",
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/vid/240x240/ANoa8-sHOAm0WJ7S.mp4",
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/vid/720x720/4W2O7kbxklEZBKyp.mp4",
                "https://video.twimg.com/ext_tw_video/869638871658749954/pu/pl/2l0ByK24IQ2qFM2o.m3u8"
            ],
            "videos_count": 4,
            "without_l_len": 112,
            "without_lu_len": 112,
            "without_luh_len": 100
        },
        ...
        {
            "audio": [],
            "audio_count": 0,
            "canonical_id": "",
            "classifier_language": "english",
            "classifier_language_probability": 6.234241689372219e-15,
            "created_at": "2017-05-30T19:38:39.000Z",
            "favourites_count": 0,
            "hashtags": [],
            "hashtags_count": 0,
            "hosts": [
                "pic.twitter.com"
            ],
            "hosts_count": 1,
            "id_str": "869639212047454210",
            "images": [
                "https://pbs.twimg.com/ext_tw_video_thumb/869639124390625280/pu/img/pJQLGxxK5FMHDRxS.jpg",
                "https://pic.twitter.com/jzgfD4cHos"
            ],
            "images_count": 2,
            "link": "https://twitter.com/prettylittlerem/status/869639212047454210",
            "links": [
                "https://pic.twitter.com/jzgfD4cHos"
            ],
            "links_count": 1,
            "mentions": [
                "missremiashten",
                "breetheunikitty",
                "Elias824",
                "millselle"
            ],
            "mentions_count": 4,
            "parent": "",
            "place_context": "ABOUT",
            "place_id": "",
            "place_name": "",
            "provider_type": "SCRAPED",
            "retweet_count": 0,
            "screen_name": "prettylittlerem",
            "source_type": "TWITTER",
            "text": "THIS DANCE VID WAS AMAZING!!!! TAKE THAT ELLE! @missremiashten @breetheunikitty @Elias824 @millselle https://pic.twitter.com/jzgfD4cHos",
            "text_length": 135,
            "timestamp": "2017-05-30T19:41:02.825Z",
            "unshorten": {},
            "user": {
                "appearance_first": "2017-05-30T19:41:02.825Z",
                "appearance_latest": "2017-05-30T19:41:02.825Z",
                "name": "MISS REMI ASHTEN",
                "profile_image_url_https": "https://pbs.twimg.com/profile_images/848365216027078656/676UcZuR_bigger.jpg",
                "screen_name": "prettylittlerem",
                "user_id": "843235916093231106"
            },
            "videos": [
                "https://video.twimg.com/ext_tw_video/869639124390625280/pu/vid/240x240/zp2c3364FDfSRej7.mp4",
                "https://video.twimg.com/ext_tw_video/869639124390625280/pu/vid/480x480/W7hByhvogFvZlTfZ.mp4",
                "https://video.twimg.com/ext_tw_video/869639124390625280/pu/pl/vg2UXk7rssHUq0Ga.m3u8"
            ],
            "videos_count": 3,
            "without_l_len": 100,
            "without_lu_len": 46,
            "without_luh_len": 46
        },
        ...

I have:

  • There is a corresponding issue for this pull request.
  • Mentioned the Issue number in the pull request commit message Fixes #<number> commit message
  • There is only strictly only one commit per issue.

For the reviewers

I have:

  • Reviewed this pull request by an authorized contributor.
  • The reviewer is assigned to the pull request.
@singhpratyush singhpratyush force-pushed the singhpratyush:1171 branch 2 times, most recently from b32a2c5 to be67f20 May 30, 2017
@hemantjadon

This comment has been minimized.

Copy link

hemantjadon commented May 31, 2017

Hi, @singhpratyush I don't know why this seems to have modified the whole file can you please check?

@vibhcool

This comment has been minimized.

Copy link
Member

vibhcool commented May 31, 2017

@singhpratyush , could you add part of output in gist, and share it's link instead of complete output,
btw good job 👍

@singhpratyush

This comment has been minimized.

Copy link
Member Author

singhpratyush commented May 31, 2017

@vibhcool: Done! Thanks.

@hemantjadon: I'm not sure but I guess it's about the line breaks. In the earlier version, it was CRLF and I converted them to LF.

@hemantjadon

This comment has been minimized.

Copy link

hemantjadon commented May 31, 2017

@singhpratyush Yeah that maybe the case. We can create a .editorconfig the file which will handle such cases, so that such thing doesn't happen and uniformity is maintained throughout with all devs?

@singhpratyush

This comment has been minimized.

Copy link
Member Author

singhpratyush commented May 31, 2017

@hemantjadon: That would be good, but should go to another PR I guess.

@vibhcool

This comment has been minimized.

Copy link
Member

vibhcool commented May 31, 2017

@singhpratyush , it is difficult to review complete module of 835 lines , please fix the it. 😅

@singhpratyush singhpratyush force-pushed the singhpratyush:1171 branch from be67f20 to aa51f6c May 31, 2017
    This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.
@singhpratyush singhpratyush force-pushed the singhpratyush:1171 branch from aa51f6c to 713a928 May 31, 2017
@singhpratyush

This comment has been minimized.

Copy link
Member Author

singhpratyush commented May 31, 2017

Copy link
Member

vibhcool left a comment

LGTM

@mariobehling mariobehling merged commit 7c513ae into loklak:development Jun 1, 2017
2 checks passed
2 checks passed
codacy/pr Good work! A positive pull request.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@singhpratyush singhpratyush deleted the singhpratyush:1171 branch Jun 15, 2017
mariobehling added a commit that referenced this pull request Jul 5, 2017
* deploy button info for docker #1001

* Fixes #1045 : Replace the image logo in navigation bar with a text


Fixes #1045 : Replace the image logo in navigation bar with a text

* Fixes #1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue #1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder #1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue #1049.
The pull request #1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes #1060: Increase default Xmx value

* Fixed #1059 - Remove and Ignore .DS_Store

* Fixes #1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See #1042

* README.md upd, useful links added

* Fixes #1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes #1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to #1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to #1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes #1003

* Creating Volume for persistence while deploying via docker, fix #1051 (#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes #1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue #919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes #1099 : Changes the href link of the button download, install and extend

* fix #1056 - document how to start contributing (#1063)

* Added JS EventListener to resize dump iframe on load. Closes #1101

* Add Unit Tests to Loklak Server (#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes #1103: Changed the URLs to the correct ones (#1104)

* Fixes #1103: Changed the URLs to the correct ones

* Fixes #1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes #961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (#1097)

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related #1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related #1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix #1138: Correct spelling mistake in README.md (#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button in rst file (#1137)

* Related #1070: Fix Codacy issues for files in org.loklak.api.search (#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes #1139: Changed the URL (#1141)

* Fixes #1139: Changed the URL

* Fixes #1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to #1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes #1070:Strings must use double quotes, no-use-before-define

* Related to #1070:Strings must use double quotes, no-use-before-define

* Related #1058: Add Kaizen harvester usage documentation (#1145)

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define (#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to #1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* fix #1130: Make retries and back off parameter for backend push configurable (#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of #1132: Add unit test to check TwitterScraper output (#1133)

* convert markdown file to rst (#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related #1070: Improve code quality for org.loklak.api.admin.* (#1149)

* Related to #1070: Improve code quality for org.loklak.Crawler.java

* fix related to #1152: code refractoring for logging (#1153)

* fix related to #1133: fix access specifiers (#1151)

* fixes #1161: Add GCloud Kubernetes deployment document for loklak (#1162)

* fixes #1146: Check for TwitterFactory before getting instance (#1147)

* Related #1070: Fix Codacy issues for org.loklak.api.amazon.* (#1163)

* fix #1143: Fix NumberException in YoutubeScraper (#1157)

* Installation and Start on a user specified port (#1159)

Solves issue: #925

* Fixes #1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to #1112:Add filter for images, videos (#1164)

* Related #1156: Make harvesting decision biased for Kaizen (#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes #1167 GithubScraperService able to scrape user specific data (#1168)

Fixes issue #1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes #1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: #1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes #1177 - Added tests for WordpressCrawlerService.java

fixes issue #1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix #1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes #1184 - Instagram Profile Scraper is now working

fixes issue #1184. Instagram scraper is now returning data.

* fix #1179: Use java.net.URL to build relative URL in ClientConnection (#1183)

* fixes #1070: Add test for URL unshortening (#1173)

* fixes #1169 - Added test for Github profile scraper (#1185)

fixes issue #1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (#1187)

* Related #1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes #1191: NullPointerException in CareTaker.java (#1192)

* Auto-generate docs in dev.loklak.org repository (#1195)

* Fix #1171: Extract video URLs from IFrame (#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx #1201: Break down KaizenHarvester into simpler pieces (#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix #1208: Add .editorconfig (#1209)

* Fixes #1204 Add subtree if not already added (#1207)

* Fix #1205: Extract complete video URLs for Tweets (#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes #1196 - Enhanced Quora profile scraper #1199 (#1200)

Fixes issue #1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix #1188: Use unbescape to unescape HTML in html2utf8 (#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes #1097: Restore access specifiers in TwitterScraper.java (#1198)

* Fix indentation (#1211)

* Fix #1212: fix checkstyle errors(except missing javadoc) (#1218)

* Fixes #1215 fix syntax error in the script (#1217)

* Fix #1213: Include videos for testing TwitterScraper (#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (#1159)" (#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes #1202: Modify loggers in Loklak Server for testing (#1222)

* Fixes #1219: Add UTC time in TimeAndDateService (#1220)

* Fixes #1112: Add image, video filter constraints for cache (#1190)

* Fixes #1236: Update Docs for get parameter (#1237)

* Fixes #1226 Build error currently showing (#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix #1239: Correct flag values in config.properties

* Fix #1238: Add PriorityQueue harvesting strategy (#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix #1251: Correct test case for RedirectUnshortener (#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix #1247: Add function to collect stats about all classes for a classifier (#1248)

* Fix #1256: Add classifier.json endpoint to serve aggregated data (#1257)

* refactoring to have the same naming as in susi_server

* Fixes #1261: RedirectUnshortener link fix (#1262)

* Fixes #1229, #1235, Related #1230: Setup of testable version (#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix #1259: Add function for time sensitive aggregation (#1260)

* Fix #1271: Correct redirect link in test (#1272)

* Fix #1266: Allow time based aggregation in /api/classifier.json (#1267)

* Fix #1278: Correct typo in kaizen.md (#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix #1268: Add function for aggregation based on country codes (#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix #1273: Add Jacoco to provide coverage report in XML format (#1274)

* Fixes 1284: Improve test cases for URL unshortener (#1285)

* Setup post and basescraper with QuoraProfileScraper (#1249)

* Setup of testable version

setup post and basescraper

* Related #1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push
vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 6, 2017
* deploy button info for docker loklak#1001

* Fixes loklak#1045 : Replace the image logo in navigation bar with a text

Fixes loklak#1045 : Replace the image logo in navigation bar with a text

* Fixes loklak#1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue loklak#1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder loklak#1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue loklak#1049.
The pull request loklak#1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes loklak#1060: Increase default Xmx value

* Fixed loklak#1059 - Remove and Ignore .DS_Store

* Fixes loklak#1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See loklak#1042

* README.md upd, useful links added

* Fixes loklak#1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes loklak#1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to loklak#1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to loklak#1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes loklak#1003

* Creating Volume for persistence while deploying via docker, fix loklak#1051 (loklak#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes loklak#1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue loklak#919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes loklak#1099 : Changes the href link of the button download, install and extend

* fix loklak#1056 - document how to start contributing (loklak#1063)

* Added JS EventListener to resize dump iframe on load. Closes loklak#1101

* Add Unit Tests to Loklak Server (loklak#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes loklak#1103: Changed the URLs to the correct ones (loklak#1104)

* Fixes loklak#1103: Changed the URLs to the correct ones

* Fixes loklak#1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes loklak#961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (loklak#1097)

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related loklak#1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related loklak#1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix loklak#1138: Correct spelling mistake in README.md (loklak#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button in rst file (loklak#1137)

* Related loklak#1070: Fix Codacy issues for files in org.loklak.api.search (loklak#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes loklak#1139: Changed the URL (loklak#1141)

* Fixes loklak#1139: Changed the URL

* Fixes loklak#1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes loklak#1070:Strings must use double quotes, no-use-before-define

* Related to loklak#1070:Strings must use double quotes, no-use-before-define

* Related loklak#1058: Add Kaizen harvester usage documentation (loklak#1145)

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define (loklak#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* fix loklak#1130: Make retries and back off parameter for backend push configurable (loklak#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of loklak#1132: Add unit test to check TwitterScraper output (loklak#1133)

* convert markdown file to rst (loklak#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related loklak#1070: Improve code quality for org.loklak.api.admin.* (loklak#1149)

* Related to loklak#1070: Improve code quality for org.loklak.Crawler.java

* fix related to loklak#1152: code refractoring for logging (loklak#1153)

* fix related to loklak#1133: fix access specifiers (loklak#1151)

* fixes loklak#1161: Add GCloud Kubernetes deployment document for loklak (loklak#1162)

* fixes loklak#1146: Check for TwitterFactory before getting instance (loklak#1147)

* Related loklak#1070: Fix Codacy issues for org.loklak.api.amazon.* (loklak#1163)

* fix loklak#1143: Fix NumberException in YoutubeScraper (loklak#1157)

* Installation and Start on a user specified port (loklak#1159)

Solves issue: loklak#925

* Fixes loklak#1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to loklak#1112:Add filter for images, videos (loklak#1164)

* Related loklak#1156: Make harvesting decision biased for Kaizen (loklak#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes loklak#1167 GithubScraperService able to scrape user specific data (loklak#1168)

Fixes issue loklak#1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes loklak#1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: loklak#1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes loklak#1177 - Added tests for WordpressCrawlerService.java

fixes issue loklak#1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix loklak#1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes loklak#1184 - Instagram Profile Scraper is now working

fixes issue loklak#1184. Instagram scraper is now returning data.

* fix loklak#1179: Use java.net.URL to build relative URL in ClientConnection (loklak#1183)

* fixes loklak#1070: Add test for URL unshortening (loklak#1173)

* fixes loklak#1169 - Added test for Github profile scraper (loklak#1185)

fixes issue loklak#1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (loklak#1187)

* Related loklak#1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes loklak#1191: NullPointerException in CareTaker.java (loklak#1192)

* Auto-generate docs in dev.loklak.org repository (loklak#1195)

* Fix loklak#1171: Extract video URLs from IFrame (loklak#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx loklak#1201: Break down KaizenHarvester into simpler pieces (loklak#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix loklak#1208: Add .editorconfig (loklak#1209)

* Fixes loklak#1204 Add subtree if not already added (loklak#1207)

* Fix loklak#1205: Extract complete video URLs for Tweets (loklak#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes loklak#1196 - Enhanced Quora profile scraper loklak#1199 (loklak#1200)

Fixes issue loklak#1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix loklak#1188: Use unbescape to unescape HTML in html2utf8 (loklak#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes loklak#1097: Restore access specifiers in TwitterScraper.java (loklak#1198)

* Fix indentation (loklak#1211)

* Fix loklak#1212: fix checkstyle errors(except missing javadoc) (loklak#1218)

* Fixes loklak#1215 fix syntax error in the script (loklak#1217)

* Fix loklak#1213: Include videos for testing TwitterScraper (loklak#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (loklak#1159)" (loklak#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes loklak#1202: Modify loggers in Loklak Server for testing (loklak#1222)

* Fixes loklak#1219: Add UTC time in TimeAndDateService (loklak#1220)

* Fixes loklak#1112: Add image, video filter constraints for cache (loklak#1190)

* Fixes loklak#1236: Update Docs for get parameter (loklak#1237)

* Fixes loklak#1226 Build error currently showing (loklak#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix loklak#1239: Correct flag values in config.properties

* Fix loklak#1238: Add PriorityQueue harvesting strategy (loklak#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix loklak#1251: Correct test case for RedirectUnshortener (loklak#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix loklak#1247: Add function to collect stats about all classes for a classifier (loklak#1248)

* Fix loklak#1256: Add classifier.json endpoint to serve aggregated data (loklak#1257)

* refactoring to have the same naming as in susi_server

* Fixes loklak#1261: RedirectUnshortener link fix (loklak#1262)

* Fixes loklak#1229, loklak#1235, Related loklak#1230: Setup of testable version (loklak#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix loklak#1259: Add function for time sensitive aggregation (loklak#1260)

* Fix loklak#1271: Correct redirect link in test (loklak#1272)

* Fix loklak#1266: Allow time based aggregation in /api/classifier.json (loklak#1267)

* Fix loklak#1278: Correct typo in kaizen.md (loklak#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix loklak#1268: Add function for aggregation based on country codes (loklak#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix loklak#1273: Add Jacoco to provide coverage report in XML format (loklak#1274)

* Fixes 1284: Improve test cases for URL unshortener (loklak#1285)

* Setup post and basescraper with QuoraProfileScraper (loklak#1249)

* Setup of testable version

setup post and basescraper

* Related loklak#1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push
vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 6, 2017
* deploy button info for docker loklak#1001

* Fixes loklak#1045 : Replace the image logo in navigation bar with a text

Fixes loklak#1045 : Replace the image logo in navigation bar with a text

* Fixes loklak#1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue loklak#1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder loklak#1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue loklak#1049.
The pull request loklak#1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes loklak#1060: Increase default Xmx value

* Fixed loklak#1059 - Remove and Ignore .DS_Store

* Fixes loklak#1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See loklak#1042

* README.md upd, useful links added

* Fixes loklak#1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes loklak#1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to loklak#1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to loklak#1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes loklak#1003

* Creating Volume for persistence while deploying via docker, fix loklak#1051 (loklak#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes loklak#1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue loklak#919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes loklak#1099 : Changes the href link of the button download, install and extend

* fix loklak#1056 - document how to start contributing (loklak#1063)

* Added JS EventListener to resize dump iframe on load. Closes loklak#1101

* Add Unit Tests to Loklak Server (loklak#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes loklak#1103: Changed the URLs to the correct ones (loklak#1104)

* Fixes loklak#1103: Changed the URLs to the correct ones

* Fixes loklak#1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes loklak#961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (loklak#1097)

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related loklak#1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related loklak#1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix loklak#1138: Correct spelling mistake in README.md (loklak#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button in rst file (loklak#1137)

* Related loklak#1070: Fix Codacy issues for files in org.loklak.api.search (loklak#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes loklak#1139: Changed the URL (loklak#1141)

* Fixes loklak#1139: Changed the URL

* Fixes loklak#1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes loklak#1070:Strings must use double quotes, no-use-before-define

* Related to loklak#1070:Strings must use double quotes, no-use-before-define

* Related loklak#1058: Add Kaizen harvester usage documentation (loklak#1145)

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define (loklak#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* fix loklak#1130: Make retries and back off parameter for backend push configurable (loklak#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of loklak#1132: Add unit test to check TwitterScraper output (loklak#1133)

* convert markdown file to rst (loklak#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related loklak#1070: Improve code quality for org.loklak.api.admin.* (loklak#1149)

* Related to loklak#1070: Improve code quality for org.loklak.Crawler.java

* fix related to loklak#1152: code refractoring for logging (loklak#1153)

* fix related to loklak#1133: fix access specifiers (loklak#1151)

* fixes loklak#1161: Add GCloud Kubernetes deployment document for loklak (loklak#1162)

* fixes loklak#1146: Check for TwitterFactory before getting instance (loklak#1147)

* Related loklak#1070: Fix Codacy issues for org.loklak.api.amazon.* (loklak#1163)

* fix loklak#1143: Fix NumberException in YoutubeScraper (loklak#1157)

* Installation and Start on a user specified port (loklak#1159)

Solves issue: loklak#925

* Fixes loklak#1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to loklak#1112:Add filter for images, videos (loklak#1164)

* Related loklak#1156: Make harvesting decision biased for Kaizen (loklak#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes loklak#1167 GithubScraperService able to scrape user specific data (loklak#1168)

Fixes issue loklak#1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes loklak#1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: loklak#1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes loklak#1177 - Added tests for WordpressCrawlerService.java

fixes issue loklak#1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix loklak#1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes loklak#1184 - Instagram Profile Scraper is now working

fixes issue loklak#1184. Instagram scraper is now returning data.

* fix loklak#1179: Use java.net.URL to build relative URL in ClientConnection (loklak#1183)

* fixes loklak#1070: Add test for URL unshortening (loklak#1173)

* fixes loklak#1169 - Added test for Github profile scraper (loklak#1185)

fixes issue loklak#1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (loklak#1187)

* Related loklak#1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes loklak#1191: NullPointerException in CareTaker.java (loklak#1192)

* Auto-generate docs in dev.loklak.org repository (loklak#1195)

* Fix loklak#1171: Extract video URLs from IFrame (loklak#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx loklak#1201: Break down KaizenHarvester into simpler pieces (loklak#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix loklak#1208: Add .editorconfig (loklak#1209)

* Fixes loklak#1204 Add subtree if not already added (loklak#1207)

* Fix loklak#1205: Extract complete video URLs for Tweets (loklak#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes loklak#1196 - Enhanced Quora profile scraper loklak#1199 (loklak#1200)

Fixes issue loklak#1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix loklak#1188: Use unbescape to unescape HTML in html2utf8 (loklak#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes loklak#1097: Restore access specifiers in TwitterScraper.java (loklak#1198)

* Fix indentation (loklak#1211)

* Fix loklak#1212: fix checkstyle errors(except missing javadoc) (loklak#1218)

* Fixes loklak#1215 fix syntax error in the script (loklak#1217)

* Fix loklak#1213: Include videos for testing TwitterScraper (loklak#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (loklak#1159)" (loklak#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes loklak#1202: Modify loggers in Loklak Server for testing (loklak#1222)

* Fixes loklak#1219: Add UTC time in TimeAndDateService (loklak#1220)

* Fixes loklak#1112: Add image, video filter constraints for cache (loklak#1190)

* Fixes loklak#1236: Update Docs for get parameter (loklak#1237)

* Fixes loklak#1226 Build error currently showing (loklak#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix loklak#1239: Correct flag values in config.properties

* Fix loklak#1238: Add PriorityQueue harvesting strategy (loklak#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix loklak#1251: Correct test case for RedirectUnshortener (loklak#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix loklak#1247: Add function to collect stats about all classes for a classifier (loklak#1248)

* Fix loklak#1256: Add classifier.json endpoint to serve aggregated data (loklak#1257)

* refactoring to have the same naming as in susi_server

* Fixes loklak#1261: RedirectUnshortener link fix (loklak#1262)

* Fixes loklak#1229, loklak#1235, Related loklak#1230: Setup of testable version (loklak#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix loklak#1259: Add function for time sensitive aggregation (loklak#1260)

* Fix loklak#1271: Correct redirect link in test (loklak#1272)

* Fix loklak#1266: Allow time based aggregation in /api/classifier.json (loklak#1267)

* Fix loklak#1278: Correct typo in kaizen.md (loklak#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix loklak#1268: Add function for aggregation based on country codes (loklak#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix loklak#1273: Add Jacoco to provide coverage report in XML format (loklak#1274)

* Fixes 1284: Improve test cases for URL unshortener (loklak#1285)

* Setup post and basescraper with QuoraProfileScraper (loklak#1249)

* Setup of testable version

setup post and basescraper

* Related loklak#1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push
vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 15, 2017
This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.