Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gtfs csv double quotes error #41

Closed
stefanocudini opened this issue Oct 9, 2020 · 5 comments
Closed

gtfs csv double quotes error #41

stefanocudini opened this issue Oct 9, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@stefanocudini
Copy link
Contributor

pelias import transit cause error:

pelias import transit
info: [transit] Importing 1 transit feedsArray.
info: [wof-pip-service:master] starting with layers neighbourhood,borough,locality,localadmin,county,macrocounty,macroregion,region,dependency,country,empire,continent,marinearea,ocean
info: [transit] Creating read stream for: /data/transit/stops.txt
info: [transit] Total time taken: .844s
events.js:200
      throw er; // Unhandled 'error' event
CsvError: Invalid Opening Quote: a quote is found inside a field at line 17

the data:

malformed data, row contains double quotes:

,24405x,Maso Bolleri,,46.102234,11.123940,10110,2
9,25205x,Borino,,46.067115,11.165639,10110,2
10,28205z,Cadine Strada Gardesana,,46.088630,11.065018,10110,1
11,28205x,Cadine Strada Gardesana,,46.088729,11.064509,10110,1
12,22110c,Canova Paludi,,46.099170,11.109314,10110,1
13,22015x,Gardolo Materna Paludi,,46.103846,11.108235,10110,2
14,21220-,Centro Commerciale,,46.091584,11.105652,10110,
15,21105z,Asiago S.Bartolameo,,46.047972,11.137456,10110,1
16,21100x,Asiago Banala,,46.047681,11.137741,10110,1
17,24015x,Cognola "Toresela",,46.078904,11.153473,10110,2
@stefanocudini stefanocudini added the bug Something isn't working label Oct 9, 2020
@missinglink
Copy link
Member

missinglink commented Oct 9, 2020

We're using this popular npm module to parse the CSV:
https://github.com/adaltas/node-csv-parse

I don't think there is a formal CSV Specification, but this RFC indicates how to encode quotes for the widest support:

Definition of the CSV Format

   While there are various specifications and implementations for the
   CSV format (for ex. [4], [5], [6] and [7]), there is no formal
   specification in existence, which allows for a wide variety of
   interpretations of CSV files.  This section documents the format that
   seems to be followed by most implementations:

...

   6.  Fields containing line breaks (CRLF), double quotes, and commas
       should be enclosed in double-quotes.  For example:

       "aaa","b CRLF
       bb","ccc" CRLF
       zzz,yyy,xxx

   7.  If double-quotes are used to enclose fields, then a double-quote
       appearing inside a field must be escaped by preceding it with
       another double quote.  For example:

       "aaa","b""bb","ccc"

I see a few options:

  • Ask users to quote fields
  • File a feature request with node-csv-parse to attempt to detect this form of quoting
  • Change node-csv-parse for another npm module which is more lenient

@missinglink
Copy link
Member

If the line is rewritten according to those rules it should parse correctly:

17,24015x,Cognola "Toresela",,46.078904,11.153473,10110,2
17,24015x,"Cognola ""Toresela""",,46.078904,11.153473,10110,2

@missinglink
Copy link
Member

Looking at the runtime options for node-csv-parse, I found this which might work?

Screenshot 2020-10-09 at 15 29 27

@stefanocudini
Copy link
Contributor Author

stefanocudini commented Oct 12, 2020

@missinglink thank you for your suggestions!

I think my gtfs data is out of standard in fact :-/

Looking at the runtime options for node-csv-parse, I found this which might work?

Screenshot 2020-10-09 at 15 29 27

can I pass this parameter through the pelias importer configuration? or do I need to make changes to the code? I could do a PR

@stefanocudini
Copy link
Contributor Author

tested now work I'm update PR #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants