Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup post and basescraper with QuoraProfileScraper #1249

Merged
merged 2 commits into from Jul 4, 2017

Conversation

vibhcool
Copy link
Member

@vibhcool vibhcool commented Jun 15, 2017

This PR is follow-up of PR #1250
These are the changes so far I have made:-

  1. Modify abstract class BaseScraper that will be inherited by all scrapers
  2. Configure QuoraProfileScraper with BaseScraper and Post : Related Issue Add search scraping in QuoraProfileScraper #1230
  3. Add Timeline2 class : Related Issue Add Scrapers to SearchServlet #1231 , Refactor code of TimeLine.java #1244
    Timeline iterates MessageEntry objects. A lot of classes are dependent on Timeline , so it can't be configured to iterate Post objects. That is why till the time, I TwitterScraper works independently and is not configured with Post and BaseScraper, I have created Timeline2 class.
  4. Configure Timeline2 with QuoraProfileScraper : Related Issue Add Scrapers to SearchServlet #1231

To test: http://127.0.0.1:9000/api/quoraprofilescraper?query=Vibhor-Verma-5

EDIT 1: I have added TODO's where I have to make changes.

EDIT 2:
TODOs refer to:-

  1. the lines which uses Timeline2 as signature. Here signature needs to be fixed after Timeline gets replaced by Timeline2 class. Timeline2 is temporary class.
  2. Dummy variables used: the dummy variables stores the parameters that can be fetched. I have used them. Here get-parameters have to be fetched as extra parameters.

EDIT 3: diff Timeline.java and Timeline2.java. Both are very much alike.

Short description

I have:

  • There is a corresponding issue for this pull request.
  • Mentioned the Issue number in the pull request commit message Fixes #<number> commit message
  • There is only strictly only one commit per issue.

For the reviewers

I have:

  • Reviewed this pull request by an authorized contributor.
  • The reviewer is assigned to the pull request.

@vibhcool vibhcool changed the title Setup post and basescraper with QuoraProfileScraper [WIP] : Setup post and basescraper with QuoraProfileScraper Jun 15, 2017
@vibhcool vibhcool changed the title [WIP] : Setup post and basescraper with QuoraProfileScraper Setup post and basescraper with QuoraProfileScraper Jun 15, 2017
@vibhcool
Copy link
Member Author

Copy link
Member

@kavithaenair kavithaenair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the import statements and put them in lexical order.

import java.lang.StringBuilder;
import org.loklak.server.AbstractAPIHandler;
import org.loklak.data.DAO;
import java.net.URL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow lexical order here as well.


import org.json.JSONObject;
import org.loklak.data.DAO;
import java.net.URL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

public void setPostId() { }

//TODO: Set up TwitterTweet before setting this as abstract
public String getPostId() { return new String(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look neat. Can you follow as done in line 44.

}
Object object = this.opt(key);
if (object == null) {
throw new JSONException("JSONObject[" + quote(key) + "] not found.");
//throw new JSONException("JSONObject[" + quote(key) + "] not found.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are from the org/json library. Edits to this are really not recommended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this

});
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is entirely commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this

@sudheesh001
Copy link
Member

Please provide a test link.

Copy link

@hemantjadon hemantjadon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vibhcool I am not really sure of this approach in the BaseScrapper. providing the URL in many chunks, baseUrl, midUrl, query, extra I mean I am not sure this will help in any way instantiating the instance, IMO this will create the confusion for the dev who wants to extend this class to modify something, this is tricky, as a URL inherently A complete string, and it has only 4 parts, Protocol(http/https), Domain, (In some cases a sub domain). the remaining part which is the entire path, and some queryParams, Splitting the url in our own terminology might create confusion, Is there a solid reason to do so ??

@vibhcool
Copy link
Member Author

@hemantjadon , I have used baseUrl which points to the website which points to website.
midUrl points to part(or webpage) of website we want to scrape
query is usable at various methods in the program.
I planned to fetch extra parameters from request body instead from url string. I have declared it, not used it till now.
I will add all parameters in request body, as suggested by @singhpratyush , and keep url as a string of baseUrl and midUrl.
thus, trying to keep everything hastle-free :)

@vibhcool vibhcool force-pushed the 1231 branch 7 times, most recently from 3ac2401 to bcb0243 Compare June 22, 2017 17:53
@vibhcool
Copy link
Member Author

vibhcool commented Jun 22, 2017

@sudheesh001 made all suggested changes, added test link :)
@kavithaenair fixed all syntax errors and codacy errors (some will be fixed in next PRs) :)
Please re-review

Copy link
Contributor

@Achint08 Achint08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

return new Date(this.timestamp);
}

//TODO: Set up TwitterTweet before setting this as abstract
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have inherited Post here -> https://github.com/loklak/loklak_server/pull/1249/files#diff-c7c5000d9cf0abf8e64f6de0914d35d3
If I declare methods abstract in Post right now, everywhere where AbstractObjectEntry is inherited will have to define them. And not all of those child classes are scrapers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all of those child classes are scrapers

But they still can have an unique identifier..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They doesn't seem to be needed right now. Also some of those classes are unrelated to what I am trying to achieve. So that is why, just to get TwitterTweet be subclass of Post, to get results at top, I have made this arrangement. 😅
See PR #1277


public static enum Order {
CREATED_AT("date"),
TIMESTAMP("long"),
Copy link
Contributor

@Achint08 Achint08 Jun 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to know Why TIMESTAMP is assigned as a long data type?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Achint08 TimeStamp Is an Unsigned Int U64 thats why. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp is no. of sec from 1 jan 1970, used for creating PostId, one can use it to create date from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ok cool.

@@ -0,0 +1,457 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vibhcool: Could you please point out the things in this class that are different from Timeline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is Timeline class , just that, it iterates Post objects instead of MessageEntry objects.

}

//TODO: this passes Timeline as argument
public void writeToIndex() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is to write things to ES index, I think that Timeline should also define the index name.

Also, as the number of scrapers increases, the initialisation of ES node should automatically create indices. How can this be handled?

Copy link
Member Author

@vibhcool vibhcool Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just commented this part. This isn't relevant now but may be needed when I get some scrapers start working together. I added TODO to work on this in another PR. This is pre-existing code of Timeline.java

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I am talking about. Once you get more scrapers into work, you'll need to generalise the class for them. One of the requirements for it would be defining the index in which the data would need to get pushed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhpratyush @sudheesh001 , about the structure of ES index, the existing structure can work with multiple scrapers. Some changes I think that are needed are:-

  1. presently there is IndexEntry object that acts as the interface between the ElasticSearch and the Scraper system. There is a need to change it's generic type's superclass to Post
  2. There can be 2 approaches: - an index per scraper or a Document per scraper . For both the existing structure can work. some lines of code will be needed to be added to set this up.

2nd point needs some discussion. I will create issue for this point. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhpratyush
Copy link
Member

@vibhcool: The test link is not operational. Please check.

@vibhcool
Copy link
Member Author

@singhpratyush updated the test link, please re-review :)

@SKrPl
Copy link
Contributor

SKrPl commented Jun 29, 2017

@vibhcool Please look into codacy issues.

@vibhcool
Copy link
Member Author

@SKrPl I have fixed the codacy issues. the present 2 issues that are popping up are needed to be kept till refactoring of TwitterScraper. 😅

Copy link
Member

@kavithaenair kavithaenair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase the branch. Rest LGTM 👍

@sudheesh001
Copy link
Member

I like the updates in this pull request however I agree with @singhpratyush suggestions regarding the write to index being a separate class so that as more scrapers are added there would be a single general interface that can be used to push to the local elastic search index.

setup post and basescraper
}

//TODO: Set up TwitterTweet before setting this as abstract
public void setPostId() { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No parameter in setter method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dzone.com/articles/getter-setter-use-or-not-use

go through the TODO mentioned, it is to be set abstract.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you set it to abstract, it should take some argument to which it can set the ID to.

I still don't understand how setting this abstract now will not work but setting it abstract after introducing TwitterScraper will work. If you don't have a unique identifier for QuoraScraper now, how will you produce it when TwitterScraper is introduced?

And I still think that there can be a unique identifier for each Quora profile that is scraped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the changes

//public abstract String setPostId();
//TODO: Set up TwitterTweet before setting this as abstract
public String getPostId() {
return new String();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to return "";.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't matter , see the TODO mentioned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the changes

Copy link
Member

@singhpratyush singhpratyush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments. Please take a look.

@codecov-io
Copy link

codecov-io commented Jul 4, 2017

Codecov Report

Merging #1249 into development will decrease coverage by 0.02%.
The diff coverage is 0.98%.

Impacted file tree graph

@@               Coverage Diff                @@
##             development   #1249      +/-   ##
================================================
- Coverage           9.06%   9.04%   -0.03%     
- Complexity           393     396       +3     
================================================
  Files                199     200       +1     
  Lines              17214   17403     +189     
  Branches            3223    3252      +29     
================================================
+ Hits                1561    1574      +13     
- Misses             15346   15522     +176     
  Partials             307     307
Impacted Files Coverage Δ Complexity Δ
src/org/loklak/susi/SusiThought.java 15.38% <ø> (ø) 5 <0> (ø) ⬇️
src/org/loklak/harvester/BaseScraper.java 0% <0%> (ø) 0 <0> (ø) ⬇️
src/org/loklak/api/search/QuoraProfileScraper.java 0% <0%> (ø) 0 <0> (ø) ⬇️
src/org/loklak/objects/AbstractObjectEntry.java 5.88% <0%> (ø) 5 <1> (ø) ⬇️
src/org/loklak/api/search/ConsoleService.java 0% <0%> (ø) 0 <0> (ø) ⬇️
src/org/loklak/harvester/Post.java 58.82% <0%> (+58.82%) 2 <0> (+2) ⬆️
src/org/loklak/objects/Timeline2.java 0% <0%> (ø) 0 <0> (?)
src/org/loklak/objects/MessageEntry.java 24.93% <22.22%> (-0.07%) 23 <0> (ø)
src/org/json/JSONObject.java 22.87% <0%> (+0.34%) 57% <0%> (+1%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3aef8be...24911f1. Read the comment docs.

@vibhcool vibhcool force-pushed the 1231 branch 2 times, most recently from 4ad6c84 to 28efecf Compare July 4, 2017 08:24
Copy link
Member

@singhpratyush singhpratyush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this as this is required to introduce other components and other issues can be taken care of as we introduce them.

Further, the idea of writing to index needs to be carefully thought of so that the plan of "indexing everything" works

@mariobehling mariobehling merged commit 02a5920 into loklak:development Jul 4, 2017
mariobehling pushed a commit that referenced this pull request Jul 5, 2017
* deploy button info for docker #1001

* Fixes #1045 : Replace the image logo in navigation bar with a text


Fixes #1045 : Replace the image logo in navigation bar with a text

* Fixes #1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue #1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder #1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue #1049.
The pull request #1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes #1060: Increase default Xmx value

* Fixed #1059 - Remove and Ignore .DS_Store

* Fixes #1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See #1042

* README.md upd, useful links added

* Fixes #1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes #1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to #1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to #1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes #1003

* Creating Volume for persistence while deploying via docker, fix #1051 (#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes #1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue #919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes #1099 : Changes the href link of the button download, install and extend

* fix #1056 - document how to start contributing (#1063)

* Added JS EventListener to resize dump iframe on load. Closes #1101

* Add Unit Tests to Loklak Server (#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes #1103: Changed the URLs to the correct ones (#1104)

* Fixes #1103: Changed the URLs to the correct ones

* Fixes #1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes #961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (#1097)

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related #1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related #1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix #1138: Correct spelling mistake in README.md (#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes #1123: Adding Gemnasium Button & Fixing Docker build button in rst file (#1137)

* Related #1070: Fix Codacy issues for files in org.loklak.api.search (#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes #1139: Changed the URL (#1141)

* Fixes #1139: Changed the URL

* Fixes #1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to #1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes #1070:Strings must use double quotes, no-use-before-define

* Related to #1070:Strings must use double quotes, no-use-before-define

* Related #1058: Add Kaizen harvester usage documentation (#1145)

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define (#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to #1070

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix #1070: Strings must use doublequote. (quotes), no-use-before-define

* fix #1130: Make retries and back off parameter for backend push configurable (#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of #1132: Add unit test to check TwitterScraper output (#1133)

* convert markdown file to rst (#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related #1070: Improve code quality for org.loklak.api.admin.* (#1149)

* Related to #1070: Improve code quality for org.loklak.Crawler.java

* fix related to #1152: code refractoring for logging (#1153)

* fix related to #1133: fix access specifiers (#1151)

* fixes #1161: Add GCloud Kubernetes deployment document for loklak (#1162)

* fixes #1146: Check for TwitterFactory before getting instance (#1147)

* Related #1070: Fix Codacy issues for org.loklak.api.amazon.* (#1163)

* fix #1143: Fix NumberException in YoutubeScraper (#1157)

* Installation and Start on a user specified port (#1159)

Solves issue: #925

* Fixes #1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to #1112:Add filter for images, videos (#1164)

* Related #1156: Make harvesting decision biased for Kaizen (#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes #1167 GithubScraperService able to scrape user specific data (#1168)

Fixes issue #1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes #1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: #1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes #1177 - Added tests for WordpressCrawlerService.java

fixes issue #1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix #1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes #1184 - Instagram Profile Scraper is now working

fixes issue #1184. Instagram scraper is now returning data.

* fix #1179: Use java.net.URL to build relative URL in ClientConnection (#1183)

* fixes #1070: Add test for URL unshortening (#1173)

* fixes #1169 - Added test for Github profile scraper (#1185)

fixes issue #1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (#1187)

* Related #1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes #1191: NullPointerException in CareTaker.java (#1192)

* Auto-generate docs in dev.loklak.org repository (#1195)

* Fix #1171: Extract video URLs from IFrame (#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx #1201: Break down KaizenHarvester into simpler pieces (#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix #1208: Add .editorconfig (#1209)

* Fixes #1204 Add subtree if not already added (#1207)

* Fix #1205: Extract complete video URLs for Tweets (#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes #1196 - Enhanced Quora profile scraper #1199 (#1200)

Fixes issue #1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix #1188: Use unbescape to unescape HTML in html2utf8 (#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes #1097: Restore access specifiers in TwitterScraper.java (#1198)

* Fix indentation (#1211)

* Fix #1212: fix checkstyle errors(except missing javadoc) (#1218)

* Fixes #1215 fix syntax error in the script (#1217)

* Fix #1213: Include videos for testing TwitterScraper (#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (#1159)" (#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes #1202: Modify loggers in Loklak Server for testing (#1222)

* Fixes #1219: Add UTC time in TimeAndDateService (#1220)

* Fixes #1112: Add image, video filter constraints for cache (#1190)

* Fixes #1236: Update Docs for get parameter (#1237)

* Fixes #1226 Build error currently showing (#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix #1239: Correct flag values in config.properties

* Fix #1238: Add PriorityQueue harvesting strategy (#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix #1251: Correct test case for RedirectUnshortener (#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix #1247: Add function to collect stats about all classes for a classifier (#1248)

* Fix #1256: Add classifier.json endpoint to serve aggregated data (#1257)

* refactoring to have the same naming as in susi_server

* Fixes #1261: RedirectUnshortener link fix (#1262)

* Fixes #1229, #1235, Related #1230: Setup of testable version (#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix #1259: Add function for time sensitive aggregation (#1260)

* Fix #1271: Correct redirect link in test (#1272)

* Fix #1266: Allow time based aggregation in /api/classifier.json (#1267)

* Fix #1278: Correct typo in kaizen.md (#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix #1268: Add function for aggregation based on country codes (#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix #1273: Add Jacoco to provide coverage report in XML format (#1274)

* Fixes 1284: Improve test cases for URL unshortener (#1285)

* Setup post and basescraper with QuoraProfileScraper (#1249)

* Setup of testable version

setup post and basescraper

* Related #1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push
vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 6, 2017
* deploy button info for docker loklak#1001

* Fixes loklak#1045 : Replace the image logo in navigation bar with a text

Fixes loklak#1045 : Replace the image logo in navigation bar with a text

* Fixes loklak#1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue loklak#1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder loklak#1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue loklak#1049.
The pull request loklak#1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes loklak#1060: Increase default Xmx value

* Fixed loklak#1059 - Remove and Ignore .DS_Store

* Fixes loklak#1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See loklak#1042

* README.md upd, useful links added

* Fixes loklak#1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes loklak#1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to loklak#1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to loklak#1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes loklak#1003

* Creating Volume for persistence while deploying via docker, fix loklak#1051 (loklak#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes loklak#1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue loklak#919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes loklak#1099 : Changes the href link of the button download, install and extend

* fix loklak#1056 - document how to start contributing (loklak#1063)

* Added JS EventListener to resize dump iframe on load. Closes loklak#1101

* Add Unit Tests to Loklak Server (loklak#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes loklak#1103: Changed the URLs to the correct ones (loklak#1104)

* Fixes loklak#1103: Changed the URLs to the correct ones

* Fixes loklak#1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes loklak#961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (loklak#1097)

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related loklak#1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related loklak#1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix loklak#1138: Correct spelling mistake in README.md (loklak#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button in rst file (loklak#1137)

* Related loklak#1070: Fix Codacy issues for files in org.loklak.api.search (loklak#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes loklak#1139: Changed the URL (loklak#1141)

* Fixes loklak#1139: Changed the URL

* Fixes loklak#1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes loklak#1070:Strings must use double quotes, no-use-before-define

* Related to loklak#1070:Strings must use double quotes, no-use-before-define

* Related loklak#1058: Add Kaizen harvester usage documentation (loklak#1145)

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define (loklak#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* fix loklak#1130: Make retries and back off parameter for backend push configurable (loklak#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of loklak#1132: Add unit test to check TwitterScraper output (loklak#1133)

* convert markdown file to rst (loklak#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related loklak#1070: Improve code quality for org.loklak.api.admin.* (loklak#1149)

* Related to loklak#1070: Improve code quality for org.loklak.Crawler.java

* fix related to loklak#1152: code refractoring for logging (loklak#1153)

* fix related to loklak#1133: fix access specifiers (loklak#1151)

* fixes loklak#1161: Add GCloud Kubernetes deployment document for loklak (loklak#1162)

* fixes loklak#1146: Check for TwitterFactory before getting instance (loklak#1147)

* Related loklak#1070: Fix Codacy issues for org.loklak.api.amazon.* (loklak#1163)

* fix loklak#1143: Fix NumberException in YoutubeScraper (loklak#1157)

* Installation and Start on a user specified port (loklak#1159)

Solves issue: loklak#925

* Fixes loklak#1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to loklak#1112:Add filter for images, videos (loklak#1164)

* Related loklak#1156: Make harvesting decision biased for Kaizen (loklak#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes loklak#1167 GithubScraperService able to scrape user specific data (loklak#1168)

Fixes issue loklak#1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes loklak#1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: loklak#1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes loklak#1177 - Added tests for WordpressCrawlerService.java

fixes issue loklak#1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix loklak#1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes loklak#1184 - Instagram Profile Scraper is now working

fixes issue loklak#1184. Instagram scraper is now returning data.

* fix loklak#1179: Use java.net.URL to build relative URL in ClientConnection (loklak#1183)

* fixes loklak#1070: Add test for URL unshortening (loklak#1173)

* fixes loklak#1169 - Added test for Github profile scraper (loklak#1185)

fixes issue loklak#1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (loklak#1187)

* Related loklak#1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes loklak#1191: NullPointerException in CareTaker.java (loklak#1192)

* Auto-generate docs in dev.loklak.org repository (loklak#1195)

* Fix loklak#1171: Extract video URLs from IFrame (loklak#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx loklak#1201: Break down KaizenHarvester into simpler pieces (loklak#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix loklak#1208: Add .editorconfig (loklak#1209)

* Fixes loklak#1204 Add subtree if not already added (loklak#1207)

* Fix loklak#1205: Extract complete video URLs for Tweets (loklak#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes loklak#1196 - Enhanced Quora profile scraper loklak#1199 (loklak#1200)

Fixes issue loklak#1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix loklak#1188: Use unbescape to unescape HTML in html2utf8 (loklak#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes loklak#1097: Restore access specifiers in TwitterScraper.java (loklak#1198)

* Fix indentation (loklak#1211)

* Fix loklak#1212: fix checkstyle errors(except missing javadoc) (loklak#1218)

* Fixes loklak#1215 fix syntax error in the script (loklak#1217)

* Fix loklak#1213: Include videos for testing TwitterScraper (loklak#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (loklak#1159)" (loklak#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes loklak#1202: Modify loggers in Loklak Server for testing (loklak#1222)

* Fixes loklak#1219: Add UTC time in TimeAndDateService (loklak#1220)

* Fixes loklak#1112: Add image, video filter constraints for cache (loklak#1190)

* Fixes loklak#1236: Update Docs for get parameter (loklak#1237)

* Fixes loklak#1226 Build error currently showing (loklak#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix loklak#1239: Correct flag values in config.properties

* Fix loklak#1238: Add PriorityQueue harvesting strategy (loklak#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix loklak#1251: Correct test case for RedirectUnshortener (loklak#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix loklak#1247: Add function to collect stats about all classes for a classifier (loklak#1248)

* Fix loklak#1256: Add classifier.json endpoint to serve aggregated data (loklak#1257)

* refactoring to have the same naming as in susi_server

* Fixes loklak#1261: RedirectUnshortener link fix (loklak#1262)

* Fixes loklak#1229, loklak#1235, Related loklak#1230: Setup of testable version (loklak#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix loklak#1259: Add function for time sensitive aggregation (loklak#1260)

* Fix loklak#1271: Correct redirect link in test (loklak#1272)

* Fix loklak#1266: Allow time based aggregation in /api/classifier.json (loklak#1267)

* Fix loklak#1278: Correct typo in kaizen.md (loklak#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix loklak#1268: Add function for aggregation based on country codes (loklak#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix loklak#1273: Add Jacoco to provide coverage report in XML format (loklak#1274)

* Fixes 1284: Improve test cases for URL unshortener (loklak#1285)

* Setup post and basescraper with QuoraProfileScraper (loklak#1249)

* Setup of testable version

setup post and basescraper

* Related loklak#1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push
vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 6, 2017
* deploy button info for docker loklak#1001

* Fixes loklak#1045 : Replace the image logo in navigation bar with a text

Fixes loklak#1045 : Replace the image logo in navigation bar with a text

* Fixes loklak#1048: fix execution of method without query string

* fix for latest twitter html change

* add docker status badge

This is related to issue loklak#1049

* update documentation path

Problem: The documentation has moved. The links in the README are outdated
Solution: insert the containing folder into the path of all links to docs

* moved dockerfile to root folder loklak#1049

* changed travis build to new location

* updated Dockerfile path for compose

This is for issue loklak#1049.
The pull request loklak#1050 is a precondition for this to make sense.

Problem: the Dockerfile was moved to /
Solution: adapt the path

* get aggregations also with fresh requests from twitter with source=all

* Fixes loklak#1060: Increase default Xmx value

* Fixed loklak#1059 - Remove and Ignore .DS_Store

* Fixes loklak#1067 - Tweet URL in README is broken

* corrected heading

"Where do I find the java?" ->"Where do I find the Java documentation?"

* Using the note directive of sphinx

See loklak#1042

* README.md upd, useful links added

* Fixes loklak#1033, loklak_server README.md upd, links updated with link syntax

* Move documentation site

The documentation site is now moved to https://github.com/loklak/dev.loklak.org

Closes loklak#1014

* fix username emoji in tweet

* Fix unused imports in python files(codacy issue)

Related to loklak#1070

* removed .DS_Store

* added .DS_Store to gitignore

* Fix use of Null in scala code

Related to loklak#1070

* fixed scraper

* Edited Readme

* Add update trigger script for docs

Closes loklak#1003

* Creating Volume for persistence while deploying via docker, fix loklak#1051 (loklak#1089)

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* Update docker-compose.yml

* updated docker build badge

I changed the url so github requests a new image.
The build works.
https://hub.docker.com/r/mariobehling/loklak/builds/

* Docker: Consistent Volume Path

Problem: docker-compose volume path is not the same as the dockerfile volume path
Solution: Set the docker-compose volume path to the dockerfile volume path

You can view the correct path in the Dockerfile:
https://github.com/loklak/loklak_server/blob/7a1f0378dc40ec25eec6083e43558a62408d84e8/Dockerfile#L38
I checked in the container:
```
bash-4.3# ls /loklak_server/
bin              conf             gradlew          settings.gradle
build            data             html             src
build.gradle     gradle           installation     ssi
bash-4.3# ls /
bin            lib            proc           srv            var
dev            loklak_server  root           sys
etc            media          run            tmp
home           mnt            sbin           usr
```
the data directory exists and is filled within `/loklak_server`

* .travis.yml: Add keys for dev.loklak.org

Closes loklak#1091

* fix initGet

* option to autodelete messages after one month from the main index

* disabling feature introduced with
27272ee
for issue loklak#919

The storage of the settings file caused that the settings file was
broken. It blew up to a huge file, like
$ ls -l customized_config.properties
-rw-r--r-- 1 loklak loklak 251650030 Apr 10 19:08
customized_config.properties

This is the main cause that loklak.org was down since this feature was
introduced.

* Fixes loklak#1099 : Changes the href link of the button download, install and extend

* fix loklak#1056 - document how to start contributing (loklak#1063)

* Added JS EventListener to resize dump iframe on load. Closes loklak#1101

* Add Unit Tests to Loklak Server (loklak#1098)

* Add unit tests for TwitterScraper.java

* Add data file to test JSONRandomAccessFileTest.java

* set up unit tests build in loklak Server

* fix changes requested and codacy issues

* fixes scrollbar event

* at the twitter scraper now use more readable version of assert, also fix bug with parse long in youtube scraper(fails on Long.parse method, because spaces are not removed), add unit test for youtube scrapper.

* fix bug with youtube scrapper and add unit test for scraper

* Fixes loklak#1103: Changed the URLs to the correct ones (loklak#1104)

* Fixes loklak#1103: Changed the URLs to the correct ones

* Fixes loklak#1108: Fixed the typos in documentation

* fix and modify the GithubProfileScraper.java

* fixes loklak#961: add query in KaizenHarverster's queue to get older Tweets

In case if the current timeline's query already has an until statement, replace it's date part with the oldest one. Also add DateFormat object in KaizenHarverster to parse Date into String of format yyyy-MM-dd.

* fix eclipse classpath for storing classes (loklak#1097)

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button

* Fix Codacy issue in Timeline.java. Related loklak#1070

Link to codacy: https://www.codacy.com/app/sudheesh1995/loklak_server/file/6470204147/issues/source?bid=3495500&fileBranchId=3495500
Description: Fields should be declared at the top of the class

* Fix Codacy issue for some files in org.loklak.server.api. Related loklak#1070

* ConsoleService.java
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484902617/issues/source?bid=3495500&fileBranchId=3495500

* EventBriteCrawler.java
  - Make spacing consistent for conditionals

* GraphServlet.java
  - Reduce complexity of doGet method
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6484903642/issues/source?bid=3495500&fileBranchId=3495500

* Rename Dockerfile-learnings.md to docs/Dockerfile-learnings.md

* fix loklak#1138: Correct spelling mistake in README.md (loklak#1140)

Change "descripe" to "describe" in How to Contribute section.

* Fixes loklak#1123: Adding Gemnasium Button & Fixing Docker build button in rst file (loklak#1137)

* Related loklak#1070: Fix Codacy issues for files in org.loklak.api.search (loklak#1134)

* EventBriteCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733425/issues/source?bid=3495500&fileBranchId=3495500

* GenericScraper.java
  - Indentation fix
  - New line before EOF

* GithubProfileScraper.java
  - Remove trailing whitespaces

* MeetupsCrawlerService.java
  - Use one line for each declaration
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733676/issues/source?bid=3495500&fileBranchId=3495500

* SearchServlet.java
  - Indentation fix

* SuggestServlet.java
  - Position literals first in String comparisons
  - Fields should be declared at the top of the class
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733640/issues/source?bid=3495500&fileBranchId=3495500

* WeiboUserInfo.java
  - Switch statements should have a default label
  - https://www.codacy.com/app/sudheesh1995/loklak_server/file/6497733550/issues/source?bid=3495500&fileBranchId=3495500

* Fixes loklak#1139: Changed the URL (loklak#1141)

* Fixes loklak#1139: Changed the URL

* Fixes loklak#1139: Changed the URL

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fixes loklak#1070:Strings must use double quotes, no-use-before-define

* Related to loklak#1070:Strings must use double quotes, no-use-before-define

* Related loklak#1058: Add Kaizen harvester usage documentation (loklak#1145)

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define (loklak#1121)

* Fix "Strings must use doublequote. (quotes)"
Related to loklak#1070

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* Fix loklak#1070: Strings must use doublequote. (quotes), no-use-before-define

* fix loklak#1130: Make retries and back off parameter for backend push configurable (loklak#1131)

These variables can be set from config.properties by changing/defining caretaker.backendpush.retries and caretaker.backendpush.backoff respectively.

* Fixes part of loklak#1132: Add unit test to check TwitterScraper output (loklak#1133)

* convert markdown file to rst (loklak#1142)

* Merged development fixed conflict.

* Improve code quality for org.loklak.geo.*

* Related loklak#1070: Improve code quality for org.loklak.api.admin.* (loklak#1149)

* Related to loklak#1070: Improve code quality for org.loklak.Crawler.java

* fix related to loklak#1152: code refractoring for logging (loklak#1153)

* fix related to loklak#1133: fix access specifiers (loklak#1151)

* fixes loklak#1161: Add GCloud Kubernetes deployment document for loklak (loklak#1162)

* fixes loklak#1146: Check for TwitterFactory before getting instance (loklak#1147)

* Related loklak#1070: Fix Codacy issues for org.loklak.api.amazon.* (loklak#1163)

* fix loklak#1143: Fix NumberException in YoutubeScraper (loklak#1157)

* Installation and Start on a user specified port (loklak#1159)

Solves issue: loklak#925

* Fixes loklak#1165: Fixed the QuoraProfileScraper and displaying profileImage

* Related to loklak#1112:Add filter for images, videos (loklak#1164)

* Related loklak#1156: Make harvesting decision biased for Kaizen (loklak#1158)

A probability is chosen as queuries.size() / QUERIES_LIMIT, which is compared to a randomly chosen target probability and decision is taken accordingly. In case of no limit on the queue size, probability to harvest is set to 0.5.

* Fixes loklak#1167 GithubScraperService able to scrape user specific data (loklak#1168)

Fixes issue loklak#1167.githubprofilescraper service now displays starred_url,
number of starred repos,followers_url, number of followers, following_url,
number of people following for a particuler user.

* fixes loklak#1114 Improve URL shortening service

* Include all 30X HTTP response code while checking for redirect.
* Use POST requests as fallback for GET requests - There are many cases (mostly https?://fb.me/*) when GET requests give status 400: Bad Request, while POST request works fine. The patch will allow to make an attemt for POST request for such cases and fetch the result.
* Try to fetch URL from <meta/> tag in response body in case of non redirect status code.
* Check the validity of URL shortening only once, and not for each intermediate URL.

* Displays proper url to open loklak_server

Solves issue: loklak#1172

Displays proper localhost url in which loklak_server is running after
the execution of bin/start.sh or bin/installation.sh with a "p" flag.

Earlier the localhost url only displayed port 9000 at the end in case of
bin/start.sh and concatenated the running port with 9000 in case of
bin/stop.sh.Ex:
http://localhost:9000 # bin/start.sh, actual port 8888
http://localhost:90008888 #bin/installation.sh, actual port 8888

* fixes loklak#1177 - Added tests for WordpressCrawlerService.java

fixes issue loklak#1177. Added tests for WordpressCrawlerService.java and
also removed the leading 'Author' from the author field in json
output.

* fix loklak#1176: Fetch debug flag from config file

Change configurations for TwitterScraper and ClientConnection

* fixes loklak#1184 - Instagram Profile Scraper is now working

fixes issue loklak#1184. Instagram scraper is now returning data.

* fix loklak#1179: Use java.net.URL to build relative URL in ClientConnection (loklak#1183)

* fixes loklak#1070: Add test for URL unshortening (loklak#1173)

* fixes loklak#1169 - Added test for Github profile scraper (loklak#1185)

fixes issue loklak#1169, Added tests for GithubProfileScraper service.

* Improve code quality for some files in org.loklak.api.cms and add checkstyle as gradle task (loklak#1187)

* Related loklak#1070: Improve code quality for some files in org.loklak.api.cms

Fixes are done using checkstyle with google_check.xml config and 4 space indentation level

* Add checkstyle check as gradle task

* Fixes loklak#1191: NullPointerException in CareTaker.java (loklak#1192)

* Auto-generate docs in dev.loklak.org repository (loklak#1195)

* Fix loklak#1171: Extract video URLs from IFrame (loklak#1193)

Videos are added as an IFrame for Twitter. To fetch the video URLs, we first fetch the IFrame page and then check for the video format. If it is mp4, we're done. If it is m3u8, we need to fetch the m3u8 link in order to get actual videos. Mostly, these videos are of .ts format.

Also add org.unbescape as gradle dependency to unescape string in iframe.

* FIx loklak#1201: Break down KaizenHarvester into simpler pieces (loklak#1203)

Introduce KaizenQuery class to support different methods to store queries that Kaizen needs to process

* Fix loklak#1208: Add .editorconfig (loklak#1209)

* Fixes loklak#1204 Add subtree if not already added (loklak#1207)

* Fix loklak#1205: Extract complete video URLs for Tweets (loklak#1206)

This implementation mimics the video playback flow of mobile react app of Twitter.
    1. Extract BEARER_TOKEN holding script's URL.
    2. Extract guest session token.
    3. Extract BEARER_TOKEN from URL in 1.
    4. Make Twitter API call with the parameters.

* fixes loklak#1196 - Enhanced Quora profile scraper loklak#1199 (loklak#1200)

Fixes issue loklak#1196 The scraper now provides more information like
university of user, location where user works, topics he knows, number of
followers, number of questions, number of edits, number of blogs etc.

* Fix loklak#1188: Use unbescape to unescape HTML in html2utf8 (loklak#1194)

Also improve whitespace cleaning in the method. Move old implementation to html2utf8Custom.

* Fixes loklak#1097: Restore access specifiers in TwitterScraper.java (loklak#1198)

* Fix indentation (loklak#1211)

* Fix loklak#1212: fix checkstyle errors(except missing javadoc) (loklak#1218)

* Fixes loklak#1215 fix syntax error in the script (loklak#1217)

* Fix loklak#1213: Include videos for testing TwitterScraper (loklak#1221)

* Fix 1216: Revert "Installation and Start on a user specified port (loklak#1159)" (loklak#1227)

This reverts commit 1e0bcd5.

Conflicts (resolved):
	bin/installation.sh
	bin/start.sh

* Fixes loklak#1202: Modify loggers in Loklak Server for testing (loklak#1222)

* Fixes loklak#1219: Add UTC time in TimeAndDateService (loklak#1220)

* Fixes loklak#1112: Add image, video filter constraints for cache (loklak#1190)

* Fixes loklak#1236: Update Docs for get parameter (loklak#1237)

* Fixes loklak#1226 Build error currently showing (loklak#1228)

* Fixes 1215 Fix relative link

* Update git to work with subtree

* Adding echo statements

* Fix loklak#1239: Correct flag values in config.properties

* Fix loklak#1238: Add PriorityQueue harvesting strategy (loklak#1240)

Also add score related to each Tweet based on retweet and favourite count.

* Fix loklak#1251: Correct test case for RedirectUnshortener (loklak#1253)

http://t.co/E3w7s2qdBT now points to http://www.mostviralfeed.com/what-lady-gaga-actually-looks-like instead of http://mostviralfeed.com/what-lady-gaga-actually-looks-like

* Fix loklak#1247: Add function to collect stats about all classes for a classifier (loklak#1248)

* Fix loklak#1256: Add classifier.json endpoint to serve aggregated data (loklak#1257)

* refactoring to have the same naming as in susi_server

* Fixes loklak#1261: RedirectUnshortener link fix (loklak#1262)

* Fixes loklak#1229, loklak#1235, Related loklak#1230: Setup of testable version (loklak#1250)

1) setup post and basescraper

2) Setup quoraprofilescraper with basescraper and post

* Fix loklak#1259: Add function for time sensitive aggregation (loklak#1260)

* Fix loklak#1271: Correct redirect link in test (loklak#1272)

* Fix loklak#1266: Allow time based aggregation in /api/classifier.json (loklak#1267)

* Fix loklak#1278: Correct typo in kaizen.md (loklak#1279)

* enhanced elasticsearch mapping

* eclipse classpath to use same as gradle

* removed unused imports

* Fix loklak#1268: Add function for aggregation based on country codes (loklak#1270)

Following operations are now possible -
* All time aggregation for all countries
* Time sensitive aggregation for all countries
* All previous aggregations for selected countries

* Fix loklak#1273: Add Jacoco to provide coverage report in XML format (loklak#1274)

* Fixes 1284: Improve test cases for URL unshortener (loklak#1285)

* Setup post and basescraper with QuoraProfileScraper (loklak#1249)

* Setup of testable version

setup post and basescraper

* Related loklak#1230, 1231, 1244: integrate Timeline2 with quorascraper

* Configure ssh agent before push
vibhcool added a commit to vibhcool/loklak_server that referenced this pull request Jul 15, 2017
* Setup of testable version

setup post and basescraper

* Related loklak#1230, 1231, 1244: integrate Timeline2 with quorascraper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants