-
Notifications
You must be signed in to change notification settings - Fork 41
Revamp of the LOAD CSV tutorial #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AlexicaWright
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a review of the "Working with CSV files" section. I'll have to review the tutorial separately.
|
|
||
| === Data format | ||
|
|
||
| All data from the CSV file is read as a string, so you need to use `toInteger()`, `toFloat()`, `split()`, or similar functions to convert values, when needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| All data from the CSV file is read as a string, so you need to use `toInteger()`, `toFloat()`, `split()`, or similar functions to convert values, when needed. | |
| Neo4j reads all data from the CSV file as a string, for other data types, you need to use `toInteger()`, `toFloat()`, `toBoolean()`, or similar functions to convert data to the appropriate type. |
split() doesn't change the data type from string, but splits it into separate entities, so it feels odd to group it with the functions that changes the data type. It's mentioned later though, so maybe it's ok?
| === Field terminator | ||
|
|
||
| Also known as delimiter, a field terminator is a character used to separate each field in a CSV file. | ||
| In this example, a comma (`,`) is used, but other characters, such as a tab (`\`) or a pipe (`|`) also work and they can be blended: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In this example, a comma (`,`) is used, but other characters, such as a tab (`\`) or a pipe (`|`) also work and they can be blended: | |
| In this example, a comma (`,`) is used, but other characters, such as a tab (`\t`) or a pipe (`|`) also work and they can be blended: |
Not sure if we need to escape the tab to make it render?
If you use a tab, the format is called TSV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the tab is working normally here, both building locally and in Surge. Regarding the TSV file format, do you think it's better to mention it or to remove the tab option as it would make the file a TSV instead of a CSV, which is the topic of this page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSV and TSV are both flat files and there is no other difference AFAIK. About the tab, it's not just a forward slash but a t also \t.
| For best performance, always `MATCH` and `MERGE` on a single label with the indexed primary-key property. | ||
| ==== | ||
|
|
||
| Suppose you use xref:#_converting_data_values[the preceding *companies.csv* file], and now you have a file that contains people and which companies they work for: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Suppose you use xref:#_converting_data_values[the preceding *companies.csv* file], and now you have a file that contains people and which companies they work for: | |
| Suppose that you have another file that contains people and which companies they work for using a reference to the xref:#_converting_data_values[*companies.csv* file: |
| 4,Karen White,1 | ||
| ---- | ||
|
|
||
| You should also separate node and relationship creation on a separate processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You should also separate node and relationship creation on a separate processing. | |
| To load these two files and create the appropriate relationships between the people from the `people.csv` file with the companies they work for in the `companies.csv` file, you need to load them both and first create nodes from the files, and then create the relationships between them. | |
| To make this process more efficient, it is recommended to separate these tasks, i.e. create the nodes in one clause per file, and then a separate clause to create the relationships. |
|
|
||
| [source,cypher,role=noplay] | ||
| ---- | ||
| // clear data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
| MATCH (e:Employee {employeeId: row.employeeId}) | ||
| MATCH (c:Company {companyId: row.Company}) | ||
| MERGE (e)-[:WORKS_FOR]->(c) | ||
| RETURN *; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is returned here?
Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>
Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>
…rt subpage and admonition for GraphAcademy
…d into csv-import
AlexicaWright
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further comments, but we're getting there! Thank you @lidiazuin !
| -- | ||
|
|
||
| Here, the movie and person data (including the IDs) is repeated in different rows every time new information about a particular actor's role is featured. | ||
| This sort of duplication compromises the structure of the data, which means you need to xref:#_preparing_the_csv_file[prepare your file] before importing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be rephrased a little? The duplication doesn't really compromise the structure of the data in general, does it? Only if you want your data in a graph structure.
Also, the link doesn't work.
| * xref:data-import/csv-files.adoc[*Working with CSV files*]: read about the structure of a CSV file and understand how data is organized. | ||
| * xref:data-import/csv-files.adoc#_cleaning_up[*Cleaning up CSV files*]: see how to use the `LOAD CSV` command to clean up the file while importing. | ||
| * xref:data-import/csv-files.adoc#_optimization[*Optimization*]: improve performance when working with large amounts of data or complex loading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three are all on the same page. Wouldn't it suffice to say "See xref:data-import/csv-files.adoc[Working with CSV files] to learn more about the structure of data, how to clean it up, and optimize it."?
|
|
||
| == Methods comparison | ||
|
|
||
| The following table shows all supported methods for importing data into Neo4j: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plot grows thicker.. In Desktop2, "Import" is available, but only for CSV. It is built-in so it's not standalone Data Importer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the Open folder > Import option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Desktop2 has Importer built in, just like the Aura console, but it only supports csv files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it already released? Because then I need to update this, I think
Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>
AlexicaWright
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments ;)
|
|
||
| == Methods comparison | ||
|
|
||
| The following table shows all supported methods for importing data into Neo4j: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Desktop2 has Importer built in, just like the Aura console, but it only supports csv files.
| [source,cypher] | ||
| -- | ||
| LOAD CSV [WITH HEADERS] FROM url [AS alias] [FIELDTERMINATOR char] | ||
| -- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| -- | |
| -- | |
| If you include the optional `WITH HEADERS`, the first line of the CSV file is treated as a header and each row is treated as a map of key-value pairs rather than a list of values. | |
| `FROM` lets you specify the location whether it is local or over the internet and it cannot be omitted. | |
| `AS alias` names each row for reference. | |
| The default field terminator in CSV files is the comma, but others are supported and can be specified using the parsing option `FIELDTERMINATOR`. |
This is just a suggestion, but since it is a tutorial about this command, I think it's worthwhile to break down the basic command and inform what each part does.
| //Example 2 - file placed in subdirectory within import directory (import/northwind/customers.csv) | ||
| LOAD CSV FROM "file:///northwind/customers.csv" | ||
| ---- | ||
| This is the content of the example `people.csv` file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the result of running that command instead?
| MERGE (c:Company {companyId: row.companyId}) | ||
| MERGE (e)-[r:WORKS_FOR]->(c) | ||
| ---- | ||
| Note that the `FIELDTERMINATOR` wasn’t specified in the `LOAD CSV` clause because the default value is a comma. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you explain it on first mention, you can omit this. I added a suggestion for that.
| . Make sure header names match those in the CSV file. | ||
|
|
||
| The `neo4j-admin database import` command can be used for the initial graph population only. | ||
| . Search for typos in the data and in the queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something to do once you know something is inaccurate?
| * Type conversion is possible by suffixing the name with indicators like `:INT`, `:BOOLEAN`, etc. | ||
|
|
||
| For more details on this header format and the tool, see the section in the link:https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/neo4j-admin-import/[Neo4j Operations Manual -> Neo4j Admin import^] and the accompanying link:https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/[tutorial^]. | ||
| == Model your data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is very confusing. I suggest to delete it and link to the chapter on data modeling instead.
Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>
AlexicaWright
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second part reviewed! Looking great Lidia!!
|
|
||
| === Field terminator | ||
|
|
||
| Also known as delimiter, a field terminator is a character used to separate each field in a CSV file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, but how would the LOAD CSV command work with more than one field terminator?
| RETURN row | ||
| -- | ||
|
|
||
| Or from a local folder, if you use an on-premise deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this doesn't work in Aura? Maybe test to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't
| If you want to open your CSV file from another location, you need to change the link:https://neo4j.com/docs/operations-manual/2025.03/configuration/configuration-settings/#config_server.directories.import[`server.directories.import`] settings. | ||
|
|
||
| [IMPORTANT] | ||
| ==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very long admonition. Could it be shortened or rewritten as a regular paragraph (i.e. not an admonition)?
|
|
||
| * `*toInteger()*`: converts a value to an integer. | ||
| * `*toFloat()*`: converts a value to a float (e.g. for monetary amounts). | ||
| * `*datetime()*`: converts a value to a `DateTime`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, we should either use code block for all data types or none. I suggest to use it for all of them. So string, float etc.
Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>
|
|
||
| == Methods comparison | ||
|
|
||
| The following table shows all supported methods for importing data into Neo4j: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it already released? Because then I need to update this, I think
| RETURN row | ||
| -- | ||
|
|
||
| Or from a local folder, if you use an on-premise deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't
|
Thanks for the documentation updates. The preview documentation has now been torn down - reopening this PR will republish it. |
AlexicaWright
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright! Massive! Looks good! Well done @lidiazuin !
* Revamp of the LOAD CSV tutorial * Replacing links to neo4j admin commands to tutorial * Update index.adoc * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * removing redundant tutorial for neo4j desktop and finalizing pages * remove unused images * reverting changes in apackage-lock * adding more info about modeling * Delete package-lock.json * Removing links to the Neo4j Desktop tutorial, adding link to the import subpage and admonition for GraphAcademy * fixes after review * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * fixes after review * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * updates after review * adding neo4j desktop to methods comparison --------- Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>
* Revamp of the LOAD CSV tutorial (#470) * Revamp of the LOAD CSV tutorial * Replacing links to neo4j admin commands to tutorial * Update index.adoc * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * removing redundant tutorial for neo4j desktop and finalizing pages * remove unused images * reverting changes in apackage-lock * adding more info about modeling * Delete package-lock.json * Removing links to the Neo4j Desktop tutorial, adding link to the import subpage and admonition for GraphAcademy * fixes after review * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * fixes after review * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * updates after review * adding neo4j desktop to methods comparison --------- Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com> * fixing broken links and removing unused image --------- Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>
Review of the csv-import.adoc page and addition of the csv-file.adoc page for general reference.