Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apoc.load.csv does not close the file on consumption end #4078

Open
Ava-S opened this issue May 16, 2024 · 1 comment
Open

apoc.load.csv does not close the file on consumption end #4078

Ava-S opened this issue May 16, 2024 · 1 comment

Comments

@Ava-S
Copy link

Ava-S commented May 16, 2024

I use Python to preprocess a file and then load it into the database. More specifically, I have defined the following import flow:
1. Users specify the location of the input file
2. The code preprocesses the file
3. The code requests the import directory and moves it to the import folder of the database
4. The file is imported using apoc.load.csv
5. Once the import is done, I tidy up by deleting the file from the import folder.

However, after recently upgrading my database from version 5.9.0 to 5.10.0, I'm not allowed to delete the file from the import folder, as it is still being consumed by the database. The error still prevails in version 5.17 (last tested).

Expected Behavior (Mandatory)

After a file imported using apoc.load.csv, the file should be closed on consumption end, so that other processes can access the file.

Actual Behavior (Mandatory)

The issue arises when I attempt to delete the file post-import. I encounter a PermissionError, signaling that the file is still in use by another process. It seems the database is holding onto the file longer than anticipated, causing a conflict with my cleanup operation.

More specifically, I get this error: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: '<ne04j>\\import\\import_file.csv

How to Reproduce the Problem

Simple Dataset (where it's possibile)

The specific dataset does not matter, it happens with any dataset I try to import.

Python Code

This is the Python code I use to import and delete the file.

# Import file
def import_file(tx):
	result = tx.run('''CALL apoc.periodic.iterate('
                        CALL apoc.load.csv("import_file.csv") yield map as row return row',
                        'CREATE (record:Record)
                        SET record += row'
                    , {batchSize:10000, parallel:true, retries: 1});''')

with self.driver.session(database="neo4j") as session:
        session.execute_write(import_file)

# Delete the file from the import directory
path = Path(self.get_import_directory(), "import_file.csv")
os.remove(path)

Steps (Mandatory)

  1. Import data using apoc.load.csv with the Python neo4j driver
  2. Delete the file directly afterwards using Python

Specifications (Mandatory)

Currently used versions

Versions

  • OS: Windows 11 Enterprise
  • Neo4j: v5.10
  • Neo4j-Apoc (and extended): v5.10
  • Neo4j driver: v5.10
  • Python v3.11
@vga91
Copy link
Collaborator

vga91 commented Jun 28, 2024

The error seems to occur also without apoc.periodic.iterate, and even running the apoc.load.csv directly on neo4j browser/desktop, without using python code trying to delete the file via File Explorer.

It could probably be an error in neo4j itself, as the code regarding apoc.load.csv has not changed.

I opened an issue on the neo4j kernel repository, to investigate both sides:
neo4j/neo4j#13480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Blocked
Development

No branches or pull requests

3 participants