Git versioning integration #9

vsreekanti · 2016-08-30T23:07:24Z

This integration is has three parts.

We need to build an API in Ground that listens for Github webhooks for certain repositories that are registered with Ground.
Send these events to an Aboveground server.
Have the Aboveground server clone the repository, analyze the git history (there should be a good way to calculate deltas instead of analyzing the whole git history), and update Ground with the new versions that were detected by Ground.

This depends on the pipeline that reads from Kafka and writes into Ground specified in #8.

tarpdalton · 2016-09-15T16:10:58Z

Do we also want ground to manage an API where a user/application could POST and it would write a message to one of grounds kakfa topics? For other types of webhooks.

tarpdalton · 2016-09-15T16:13:24Z

I also need to add in a python script for setting up the github webhook. For the script the user would provide:

repo URL
credentials
ground github webhook API URL

tarpdalton · 2016-09-23T17:42:07Z

This is how I am POSTing each version/commit

{
    "tags": {
        "branch": {
            "key": "branch",
            "value": "refs/branches/master",
            "type": "string"
        },
        "commit": {
            "key": "commit",
            "value": "8464793044eff62c794f88c06ea286e427efc8b9",
            "type": "string"
        },
    },
    "structureVersionId": null,
    "reference": null,
    "nodeId": "Nodes.ground",
    "parameters": {}
}

Do you think this is the best way to do it?
Should nodeId be repo id (153213) instead of repo name? repo names are not unique and could cause problems

tarpdalton · 2016-09-23T21:04:28Z

I'm facing a problem with the nodes/versions/ API, I'm listing the parent versions in the URL and when I get to 30~ versions the API starts taking a long time and then just not working. I'm using neo4j as my backend. If I don't list and parents in the URL then it works fine and doesn't slow down. I'll look into the API and see if I can figure out what might be causing it.

tarpdalton · 2016-09-23T22:00:29Z

It looks like it is a problem when getting the DAG. If the path along the versions is too long it causes problems
https://github.com/ground-context/ground/blob/master/ground-core/src/main/java/edu/berkeley/ground/api/versions/neo4j/Neo4jVersionHistoryDAGFactory.java#L45

vsreekanti · 2016-09-23T22:06:59Z

Oh, I see. That's using a Neo4j feature for recursive queries basically. We might be able to do things a little faster if we create an index on VersionSuccessor in Neo4j. Do you want to try that?

tarpdalton · 2016-09-28T16:46:56Z

I've used gremlin more than cypher so I'm still trying to figure out how to create an index to help the query go faster. It looks like its easy to create an index on a property type but that doesn't really help us.

vsreekanti · 2016-09-28T21:31:57Z

Hm, what makes you say that creating an index on the VersionSuccessor property type will not help?

Do you think you could set up a Neo4j server in the cloud somewhere with this data, so we can both play with the data to figure out how to make this go faster?

tarpdalton · 2016-09-28T21:51:09Z

I guess I thought VersionSuccessor was a relationship not a property type.
I ran CREATE INDEX ON :NodeVersion(VersionSuccessor) yesterday and it didn't improve performance.

vsreekanti · 2016-09-29T16:40:54Z

Hm. I think that creates an index on the VersionSuccesor property of the vertices labeled NodeVersion. It's not clear to me why a path of length 50 has such a big performance hit. Could you try doing something like [:VersionSuccessor*..100] (only find paths of length up to 100).

I'm going to look into this a little bit more, but if Neo4j can't handle these queries, then we might have to drop it and go back to Postgres...

tarpdalton · 2016-09-29T21:55:57Z

Ok I think it might be working. I'll test it more

tarpdalton · 2016-09-30T19:50:56Z

Still not working, but this query might work for getting similar information. It's faster but it doesn't include the starting node. I'll see I can get the query to include it

MATCH (a:NodeVersion { node_id : 'Nodes.ground' }) -[r:VersionSuccessor] - (b:NodeVersion {node_id : 'Nodes.ground'})
RETURN DISTINCT r

Nodes.ground is the initial node

tarpdalton · 2016-09-30T19:59:17Z

MATCH (a)-[r:VersionSuccessor]-(b:NodeVersion {node_id : 'Nodes.ground'})
WHERE a.id='Nodes.ground'
OR a.node_id='Nodes.ground'
RETURN DISTINCT r

This will include the original node.

vsreekanti · 2016-09-30T20:45:04Z

Interesting. This works recursively?

tarpdalton · 2016-10-03T16:35:02Z

I don't think the new one does it recursively. It gets all relationships that come from or go to a node_id "Nodes.ground", with label "VersionSuccessor". Then it combines them all to get a path.

When I ran the old query:

MATCH (a:Node {id: 'Nodes.engine' })-[e:VersionSuccessor*..20]->(b:NodeVersion)
RETURN e

It returned 9736 rows, so I think it is finding every possible path.

vsreekanti added this to the v0.1 milestone Aug 30, 2016

vsreekanti mentioned this issue Aug 31, 2016

Getting Started with Ground v0.1 Docs #15

Closed

vsreekanti assigned tarpdalton Aug 31, 2016

tarpdalton mentioned this issue Sep 16, 2016

Implement github repo ingestion #19

Merged

vsreekanti closed this as completed in 2638c89 Oct 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Git versioning integration #9

Git versioning integration #9

vsreekanti commented Aug 30, 2016

tarpdalton commented Sep 15, 2016

tarpdalton commented Sep 15, 2016

tarpdalton commented Sep 23, 2016

tarpdalton commented Sep 23, 2016

tarpdalton commented Sep 23, 2016

vsreekanti commented Sep 23, 2016 •

edited

Loading

tarpdalton commented Sep 28, 2016

vsreekanti commented Sep 28, 2016

tarpdalton commented Sep 28, 2016

vsreekanti commented Sep 29, 2016

tarpdalton commented Sep 29, 2016

tarpdalton commented Sep 30, 2016

tarpdalton commented Sep 30, 2016

vsreekanti commented Sep 30, 2016

tarpdalton commented Oct 3, 2016 •

edited

Loading

Git versioning integration #9

Git versioning integration #9

Comments

vsreekanti commented Aug 30, 2016

tarpdalton commented Sep 15, 2016

tarpdalton commented Sep 15, 2016

tarpdalton commented Sep 23, 2016

tarpdalton commented Sep 23, 2016

tarpdalton commented Sep 23, 2016

vsreekanti commented Sep 23, 2016 • edited Loading

tarpdalton commented Sep 28, 2016

vsreekanti commented Sep 28, 2016

tarpdalton commented Sep 28, 2016

vsreekanti commented Sep 29, 2016

tarpdalton commented Sep 29, 2016

tarpdalton commented Sep 30, 2016

tarpdalton commented Sep 30, 2016

vsreekanti commented Sep 30, 2016

tarpdalton commented Oct 3, 2016 • edited Loading

vsreekanti commented Sep 23, 2016 •

edited

Loading

tarpdalton commented Oct 3, 2016 •

edited

Loading