Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git versioning integration #9

Closed
vsreekanti opened this issue Aug 30, 2016 · 15 comments
Closed

Git versioning integration #9

vsreekanti opened this issue Aug 30, 2016 · 15 comments
Assignees
Milestone

Comments

@vsreekanti
Copy link
Member

This integration is has three parts.

  1. We need to build an API in Ground that listens for Github webhooks for certain repositories that are registered with Ground.
  2. Send these events to an Aboveground server.
  3. Have the Aboveground server clone the repository, analyze the git history (there should be a good way to calculate deltas instead of analyzing the whole git history), and update Ground with the new versions that were detected by Ground.

This depends on the pipeline that reads from Kafka and writes into Ground specified in #8.

@tarpdalton
Copy link
Contributor

Do we also want ground to manage an API where a user/application could POST and it would write a message to one of grounds kakfa topics? For other types of webhooks.

@tarpdalton
Copy link
Contributor

I also need to add in a python script for setting up the github webhook. For the script the user would provide:

  • repo URL
  • credentials
  • ground github webhook API URL

@tarpdalton
Copy link
Contributor

This is how I am POSTing each version/commit

{
    "tags": {
        "branch": {
            "key": "branch",
            "value": "refs/branches/master",
            "type": "string"
        },
        "commit": {
            "key": "commit",
            "value": "8464793044eff62c794f88c06ea286e427efc8b9",
            "type": "string"
        },
    },
    "structureVersionId": null,
    "reference": null,
    "nodeId": "Nodes.ground",
    "parameters": {}
}

Do you think this is the best way to do it?
Should nodeId be repo id (153213) instead of repo name? repo names are not unique and could cause problems

@tarpdalton
Copy link
Contributor

I'm facing a problem with the nodes/versions/ API, I'm listing the parent versions in the URL and when I get to 30~ versions the API starts taking a long time and then just not working. I'm using neo4j as my backend. If I don't list and parents in the URL then it works fine and doesn't slow down. I'll look into the API and see if I can figure out what might be causing it.

@tarpdalton
Copy link
Contributor

It looks like it is a problem when getting the DAG. If the path along the versions is too long it causes problems
https://github.com/ground-context/ground/blob/master/ground-core/src/main/java/edu/berkeley/ground/api/versions/neo4j/Neo4jVersionHistoryDAGFactory.java#L45

@vsreekanti
Copy link
Member Author

vsreekanti commented Sep 23, 2016

Oh, I see. That's using a Neo4j feature for recursive queries basically. We might be able to do things a little faster if we create an index on VersionSuccessor in Neo4j. Do you want to try that?

@tarpdalton
Copy link
Contributor

I've used gremlin more than cypher so I'm still trying to figure out how to create an index to help the query go faster. It looks like its easy to create an index on a property type but that doesn't really help us.

@vsreekanti
Copy link
Member Author

Hm, what makes you say that creating an index on the VersionSuccessor property type will not help?

Do you think you could set up a Neo4j server in the cloud somewhere with this data, so we can both play with the data to figure out how to make this go faster?

@tarpdalton
Copy link
Contributor

I guess I thought VersionSuccessor was a relationship not a property type.
I ran CREATE INDEX ON :NodeVersion(VersionSuccessor) yesterday and it didn't improve performance.

@vsreekanti
Copy link
Member Author

Hm. I think that creates an index on the VersionSuccesor property of the vertices labeled NodeVersion. It's not clear to me why a path of length 50 has such a big performance hit. Could you try doing something like [:VersionSuccessor*..100] (only find paths of length up to 100).

I'm going to look into this a little bit more, but if Neo4j can't handle these queries, then we might have to drop it and go back to Postgres...

@tarpdalton
Copy link
Contributor

Ok I think it might be working. I'll test it more

@tarpdalton
Copy link
Contributor

Still not working, but this query might work for getting similar information. It's faster but it doesn't include the starting node. I'll see I can get the query to include it

MATCH (a:NodeVersion { node_id : 'Nodes.ground' }) -[r:VersionSuccessor] - (b:NodeVersion {node_id : 'Nodes.ground'})
RETURN DISTINCT r

Nodes.ground is the initial node

@tarpdalton
Copy link
Contributor

MATCH (a)-[r:VersionSuccessor]-(b:NodeVersion {node_id : 'Nodes.ground'})
WHERE a.id='Nodes.ground'
OR a.node_id='Nodes.ground'
RETURN DISTINCT r

This will include the original node.

@vsreekanti
Copy link
Member Author

Interesting. This works recursively?

@tarpdalton
Copy link
Contributor

tarpdalton commented Oct 3, 2016

I don't think the new one does it recursively. It gets all relationships that come from or go to a node_id "Nodes.ground", with label "VersionSuccessor". Then it combines them all to get a path.

When I ran the old query:

MATCH (a:Node {id: 'Nodes.engine' })-[e:VersionSuccessor*..20]->(b:NodeVersion)
RETURN e

It returned 9736 rows, so I think it is finding every possible path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants