Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes logs are appended to a existing logs files of previous runs #131

Closed
lucafaggianelli opened this issue Jun 9, 2023 · 3 comments · Fixed by #148
Closed

Sometimes logs are appended to a existing logs files of previous runs #131

lucafaggianelli opened this issue Jun 9, 2023 · 3 comments · Fixed by #148
Labels
bug Something isn't working

Comments

@lucafaggianelli
Copy link
Owner

lucafaggianelli commented Jun 9, 2023

If run 2 is running, logs file 2 is empty while logs file 1 contains its own logs and also 2 logs

Update

I was able to reproduce it, the issue happens always except for the first pipeline you run after re/starting the server.

Restarting the server solves the problem, but after you run a pipeline then all successive pipelines will log to the file of the 1st pipeline.

Fix analysis

Create a logger that's unique for each pipeline run and not simply for each pipeline, otherwise successive runs will always use the same log file because json_handler wouldn't be added the logger, because, in turn, logger is always the same instance.

@lucafaggianelli lucafaggianelli added the bug Something isn't working label Jun 9, 2023
@lucafaggianelli
Copy link
Owner Author

not able to reproduce, the root cause of this issue may be bound to python's ContextVar logic used to get the current run ID. Probably it happened because there was an exception in the code and the context var got corrupted

@Kikobyte
Copy link

Kikobyte commented Sep 9, 2023

Steps to reproduce:

  • In an empty folder, create a requirements.txt file with two lines: "plombery" and "uvicorn"
  • Run pipenv3 --python 3.11 install -r requirements.txt, then pipenv3 shell (reproduces with both python 3.11 and 3.9)
  • Copy the src/app.py file from the "your first pipeline" tutorial (https://lucafaggianelli.github.io/plombery/create-a-pipeline/)
  • Run python src/app.py
  • Open localhost:8000 and from the top-level page press "Run" button multiple times.

Expected results

  • Every run shall have its own logs

Actual result:

  • All the logs belong to the very first run
$ cat .data/runs/run_1/logs.jsonl 
{"level": "INFO", "message": "Executing task fetch_raw_sales_data", "loggerName": "plombery.sales_pipeline", "timestamp": "2023-09-09T15:16:15.692Z", "pipeline": "sales_pipeline", "task": null}
{"level": "DEBUG", "message": "Fetching sales data...", "loggerName": "plombery.fetch_raw_sales_data", "timestamp": "2023-09-09T15:16:15.692Z", "pipeline": "sales_pipeline", "task": "fetch_raw_sales_data"}
{"level": "INFO", "message": "Fetched 50 sales data rows", "loggerName": "plombery.fetch_raw_sales_data", "timestamp": "2023-09-09T15:16:15.693Z", "pipeline": "sales_pipeline", "task": "fetch_raw_sales_data"}
{"level": "INFO", "message": "Executing task fetch_raw_sales_data", "loggerName": "plombery.sales_pipeline", "timestamp": "2023-09-09T15:16:46.473Z", "pipeline": "sales_pipeline", "task": null}
{"level": "DEBUG", "message": "Fetching sales data...", "loggerName": "plombery.fetch_raw_sales_data", "timestamp": "2023-09-09T15:16:46.473Z", "pipeline": "sales_pipeline", "task": "fetch_raw_sales_data"}
{"level": "INFO", "message": "Fetched 50 sales data rows", "loggerName": "plombery.fetch_raw_sales_data", "timestamp": "2023-09-09T15:16:46.473Z", "pipeline": "sales_pipeline", "task": "fetch_raw_sales_data"}
{"level": "INFO", "message": "Executing task fetch_raw_sales_data", "loggerName": "plombery.sales_pipeline", "timestamp": "2023-09-09T15:18:44.596Z", "pipeline": "sales_pipeline", "task": null}
{"level": "DEBUG", "message": "Fetching sales data...", "loggerName": "plombery.fetch_raw_sales_data", "timestamp": "2023-09-09T15:18:44.597Z", "pipeline": "sales_pipeline", "task": "fetch_raw_sales_data"}
{"level": "INFO", "message": "Fetched 50 sales data rows", "loggerName": "plombery.fetch_raw_sales_data", "timestamp": "2023-09-09T15:18:44.597Z", "pipeline": "sales_pipeline", "task": "fetch_raw_sales_data"}

Notes:
Restarting python src/app.py "forgets" about the run that stuck getting all the logs, and after a couple of restarts:

$ find . -name "logs.jsonl" -exec wc -l {} \; | sort
0 ./.data/runs/run_10/logs.jsonl
0 ./.data/runs/run_11/logs.jsonl
0 ./.data/runs/run_2/logs.jsonl
0 ./.data/runs/run_3/logs.jsonl
0 ./.data/runs/run_5/logs.jsonl
0 ./.data/runs/run_6/logs.jsonl
0 ./.data/runs/run_8/logs.jsonl
6 ./.data/runs/run_7/logs.jsonl
9 ./.data/runs/run_1/logs.jsonl
9 ./.data/runs/run_4/logs.jsonl
9 ./.data/runs/run_9/logs.jsonl

@lucafaggianelli
Copy link
Owner Author

Thanks for the info, I'll try to reproduce it following your steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants