Files without \n separator from February #18

Closed
klangner opened this Issue Nov 22, 2012 · 6 comments

Comments

Projects
None yet
3 participants
Contributor

klangner commented Nov 22, 2012

I have just downloaded data for February and found problems in the following files:
2012-02-02-21.json.gz
2012-02-02-22.json.gz
2012-02-03-1.json.gz
2012-02-07-0.json.gz
2012-02-07-1.json.gz
2012-02-07-18.json.gz
2012-02-08-16.json.gz
2012-02-08-17.json.gz
2012-02-08-21.json.gz
2012-02-11-3.json.gz
2012-02-13-20.json.gz
2012-02-16-17.json.gz
2012-02-16-18.json.gz
2012-02-16-21.json.gz
2012-02-16-22.json.gz

There is at least one case in each file where 2 records are not separated by new line.

BTW I can send you full report with probematic records if you need

I hope it helps :-)
Krzysztof

Contributor

klangner commented Nov 22, 2012

Looking at the data I found that repository URI is:
https://api.github.dev/repos/Khan/khan-exercises
or
https://api.github.com/repos/Khan/khan-exercises

Is there any difference between them? It looks that It should be safe to merge events from both URIs. But maybe I'm missing something?

It looks like almost all repositories have both types. But I'm not sure about it since I just manually checked few most popular

Owner

igrigorik commented Nov 22, 2012

Sigh, thought I caught that one.. Is that in 2012 archives?

Contributor

klangner commented Nov 22, 2012

The problem with Uri is in february and march 2012

I find the same problem for the files in June and July 2012

Any suggestions?

igrigorik closed this Jul 23, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment