Too many files causes the logging and the shell scripts to crash #58

nishant-vashisth · 2020-01-23T15:12:52Z

When there are too many files (Over 50K, generated by 3000 tests in 5 - 10 runs) in the allure-results folder, the clean-results and clean-history APIs fail

Cleaning results
/app/cleanAllureResults.sh: line 5: /usr/bin/find: Argument list too long

I think it's unwise to expect even linux to run through these many files, but is there any way this can be sorted ?

Is it possible that the files can be batched into multiple folders?

Secondly, as the number of files increase in the folder, the response logging with list of file names becomes overbearing.
The web services have to transfer too much data to return a list of 10,000 plus filenames as response.

If this needs to be returned, there should be an env variable controlling the verbose response.
Will it make sense to only reply the count of the files present ? Which then falls back to the issue mentioned above listing so many files almost crashes the python process with memory overflow.

fescobar · 2020-01-23T15:19:42Z

@nishantvas ok, let me reproduce that and think what will be the best solution. Thanks

nishant-vashisth · 2020-01-23T15:23:16Z

Great, let me know if I can provide more details.
For now, I simply clean the results before I start publishing results from any new runs.
So there are at most 8 - 9 k files max at a given time

This way, I keep the history but loose the individual run details.

fescobar · 2020-01-25T10:19:28Z

@nishantvas Can you attach the full log from the container, please?

nishant-vashisth · 2020-01-27T08:59:23Z

Not checking results automatically
ALLURE_VERSION: 2.13.0
Generating default report
Generating report

Serving Flask app "app" (lazy loading)
Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
Debug mode: off
Running on http://0.0.0.0:5050/ (Press CTRL+C to quit)
10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/ HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/swagger-ui.css HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/swagger-ui-bundle.js HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/swagger-ui-standalone-preset.js HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /static/swagger.json HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/favicon-32x32.png HTTP/1.1" 200 -
Report successfully generated to allure-report
127.0.0.1 - - [22/Jan/2020 12:14:56] "GET /emailable-report/render HTTP/1.1" 200 -
Status: 200
Starting web server...
2020-01-22 12:14:57.575:INFO::main: Logging initialized @750ms to org.eclipse.jetty.util.log.StdErrLog
Can not open browser because this capability is not supported on your platform. You can use the link below to open the report manually.
Server started at http://10.19.0.13:4040/. Press <Ctrl+C> to exit
Cleaning results
/app/cleanAllureResults.sh: line 5: /usr/bin/find: Argument list too long
Creating history on results directory...
Copying history from previous results...
Generating report
Report successfully generated to allure-report
10.19.0.65 - - [22/Jan/2020 12:16:14] "GET /clean-results HTTP/1.1" 400 -
127.0.0.1 - - [22/Jan/2020 12:16:46] "GET /emailable-report/render HTTP/1.1" 200 -
Status: 200
10.19.0.65 - - [22/Jan/2020 12:16:46] "GET /clean-results HTTP/1.1" 200 -
Cleaning results
Creating history on results directory...
Copying history from previous results...
Generating report
Report successfully generated to allure-report
127.0.0.1 - - [22/Jan/2020 12:17:08] "GET /emailable-report/render HTTP/1.1" 200 -
Status: 200
10.19.0.65 - - [22/Jan/2020 12:17:08] "GET /clean-results HTTP/1.1" 200 -
Cleaning history
Creating history on results directory...
Copying history from previous results...
Generating report
Report successfully generated to allure-report
127.0.0.1 - - [22/Jan/2020 12:17:36] "GET /emailable-report/render HTTP/1.1" 200 -
Status: 200
10.19.0.65 - - [22/Jan/2020 12:17:36] "GET /clean-history HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:31:40] "GET /version HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:00] "GET /version HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:01] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:01] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:03] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:03] "POST /send-results HTTP/1.1" 200 -
10.19.0.65 - - [22/Jan/2020 12:32:03] "POST /send-results HTTP/1.1" 200 -

fescobar · 2020-02-05T14:28:45Z

@nishantvas I think I have an idea. Working on this.

…o crash

fescobar · 2020-02-05T22:21:52Z

@nishantvas I've released a beta version recently. In there I did some changes to support more files. Please, can you check your case with multiple files using this version image: "frankescobar/allure-docker-service:beta"? (remove all your local allure images just in case). Also you can use a new env variable to reduce the verbose from API response.

    environment:
      API_RESPONSE_LESS_VERBOSE: 1

Please, let me know if this version can resolve your problem or not. If it fails, attach a log, please.

nishant-vashisth · 2020-02-06T07:17:55Z

Sure, I'll deploy this and let you know If I can reproduce the issue.
However, would you happen to know how many files this can approximately handle ?

The way our tests are run, you can be sure that it can increase 100K files in less than couple of days.
There must be some limit

fescobar · 2020-02-06T09:14:12Z

@nishantvas I don’t know what is the limit but now I’m not finding the files to delete. Now I’m storing the history folder in a temporal directory and I’m deleting everything when you clean the results. If you can tell me what is the size of the history directory in allure-results when you have 100k results it would be better. The same check if there is a performance problem when you clean the results.
I’m no able to reproduce your scenario with real data.

nishant-vashisth · 2020-02-06T10:38:36Z

@fescobar, this does work for me and setting the env var helps a bit, but creating that many files will take some iterations to run

Can I suggest some changes in the code ?
It can be made relatively faster with some tweaks

fescobar · 2020-02-06T10:47:44Z

@nishantvas of course, can you create a pull request from beta branch? or tell me what do you think? thanks

fescobar · 2020-02-06T10:51:39Z

@nishantvas Maybe you have to use 15 or NONE but you have to execute generate-report endpoint after amount cert of time.

environment:
      CHECK_RESULTS_EVERY_SECONDS: NONE

environment:
      CHECK_RESULTS_EVERY_SECONDS: 15

https://github.com/fescobar/allure-docker-service#updating-seconds-to-check-allure-results

nishant-vashisth · 2020-02-06T11:02:41Z

@nishantvas of course, can you create a pull request from beta branch? or tell me what do you think? thanks

Couple of code smells and better python practices, but it depends if you are intending to stream line the python code present in the app.py or if you've left it because it's essentially a wrapper on the actual allure commands being called via .sh files

I can create a PR by "productionalizing" the code if you say so.

Apart from that, one thing which'll matter, in generate_report()
Don't call files = os.listdir(RESULTS_DIRECTORY) if API_RESPONSE_LESS_VERBOSE is requested

Since it can have over 200K files in my case, this will take too much memory and too much time

And in send_results the same loop can be utilized to speed up API by some level by saving on memory of keeping the extra lists which are not required if API_RESPONSE_LESS_VERBOSE is requested

nishant-vashisth · 2020-02-06T11:03:23Z

@nishantvas Maybe you have to use 15 or NONE but you have to execute generate-report endpoint after amount cert of time.

For now, I use it as NONE, since I can't have the service running the generate report on it's own since it's a very expensive task with these many files.

nishant-vashisth · 2020-02-06T11:31:26Z

There is an issue with the build..
It's probably since you're pasting the history on the root folder, although docker images have root access, not sure what's wrong here

ALLURE_VERSION: 2.13.1
Generating default report
Not checking results automatically
Generating report

Serving Flask app "app" (lazy loading)
Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
Debug mode: off
Running on http://0.0.0.0:5050/ (Press CTRL+C to quit)
Report successfully generated to allure-report
127.0.0.1 - - [06/Feb/2020 10:29:33] "GET /emailable-report/render HTTP/1.1" 200 -
Status: 200
Starting web server...
2020-02-06 10:29:34.910:INFO::main: Logging initialized @1278ms to org.eclipse.jetty.util.log.StdErrLog
Can not open browser because this capability is not supported on your platform. You can use the link below to open the report manually.
Server started at http://10.19.0.26:4040/. Press <Ctrl+C> to exit
10.19.0.65 - - [06/Feb/2020 10:35:46] "GET /version HTTP/1.1" 200 -
10.19.0.65 - - [06/Feb/2020 10:36:10] "GET /version HTTP/1.1" 200 -
Creating history on results directory...
Copying history from previous results...
cp: cannot create regular file '/app/allure-results/history/./retry-trend.json': Permission denied
cp: cannot create regular file '/app/allure-results/history/./history.json': Permission denied
cp: cannot create regular file '/app/allure-results/history/./history-trend.json': Permission denied
cp: cannot create regular file '/app/allure-results/history/./categories-trend.json': Permission denied
cp: cannot create regular file '/app/allure-results/history/./duration-trend.json': Permission denied
cp: preserving times for '/app/allure-results/history/.': Operation not permitted
Generating report
Report successfully generated to allure-report
127.0.0.1 - - [06/Feb/2020 10:36:45] "GET /emailable-report/render HTTP/1.1" 200 -
Status: 200
10.19.0.65 - - [06/Feb/2020 10:36:45] "GET /generate-report HTTP/1.1" 200 -

fescobar · 2020-02-06T11:51:30Z

@nishantvas Ok, let me resolve that. After that, I will let you do some changes to improve that.

fescobar · 2020-02-06T12:29:31Z

@nishantvas How can I reproduce this?
#58 (comment)
Are you sure you are using the beta image?, make sure to remove any other version from this container.

Also verify not running with root user. Use user by default or 1000:1000

…to crash

fescobar · 2020-02-06T13:30:55Z

I've re-released beta version again, if you want to try, remove your local beta images and pull again.
I can't reproduce this cp: cannot create regular file '/app/allure-results/history/./retry-trend.json': Permission denied Maybe those files were created with user root using the previous versions of the container and now with the new change you are having this permission problem.
If you want, you can try with user: root

    user: root
    environment:
       ...............

nishant-vashisth · 2020-02-07T09:28:43Z

I have this deployed on a kubernetes cluster with a mounted volume, but if you have changed the default user of the container. It is entirely possible that the files and folders previously created with the root user of 2.13.0 version can not be overridden.

I've tried running this in another cluster and locally, and it works.. I'll fiddle with the service and see how I can purge the volumes (they're a bit messy to get into)

fescobar · 2020-02-07T11:30:23Z

@nishantvas yes, I can reproduce it. That's the problem. I will see what I can do to avoid breaking some volume.

Steps to reproduce:

Generate results with 2.13.0
Mount same volume with the results from the previous step using beta version
Clean results using API

Result:

llure_1  | Cleaning results
allure_1  | ls: cannot access '/app/allure-results/008801a1-772a-4b42-80d0-8d2ae00ba026-attachment.webm': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/300b8cc0-bd65-4305-9e31-18ff278fbb42-attachment.webm': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/5df696b0-c0f8-449d-900f-608b2661ff20-result.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/63d2a611-5ba8-4422-9ac6-fd10a1b87976-result.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/2f0418ca-c5fb-4401-9ab6-55bbe1802d7a-attachment.webm': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/15031441-064e-4e08-8578-f230bcefdc87-container.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/310a758f-596e-47ff-8b54-3d5167a860a5-container.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/6e168492-4522-4c50-91a8-c5853c75d7cd-result.json': No such file or directory

Working in the fix.

…to crash

fescobar · 2020-02-12T00:12:44Z

@nishantvas the problem wasn't related to permissions, it was related to concurrency. Just you needed it to execute /clean-results endpoint in parallel to reproduce it.

I've implemented another fix to delete all files. I think this is better. I've re-released beta again. Remove your previous beta images.

Can you try with this new version frankescobar/allure-docker-service:beta?
Thank you so much for your patient. I will be waiting for your feedback.

nishant-vashisth · 2020-02-12T11:04:46Z

Thanks so much, this does seem to be working now.
Once you merge this, can you let me know the new version this has been deployed on ?

fescobar · 2020-02-12T11:05:56Z

@nishantvas I will run some tests and I will deploy it in a few mins. I will let you know.

#58 - Fix - Too many files causes the logging and the shell scripts to crash

fescobar · 2020-02-12T11:35:33Z

@nishantvas I've redeployed version 2.13.1 and latest with the fix. Remove your local previous to pull the image overwrote. Thanks for all.

fescobar added bug Something isn't working enhancement Enhancement and removed bug Something isn't working labels Jan 23, 2020

fescobar added a commit that referenced this issue Feb 5, 2020

#58 - Fix - Too many files causes the logging and the shell scripts t…

0f4fc57

…o crash

fescobar added a commit that referenced this issue Feb 5, 2020

#58 - Fix - Too many files causes the logging and the shell scripts t…

cf7a43e

…o crash

fescobar added a commit that referenced this issue Feb 6, 2020

#58 - Fix - Too many files causes the logging and the shell scripts …

fd6f511

…to crash

fescobar added a commit that referenced this issue Feb 11, 2020

#58 - Fix - Too many files causes the logging and the shell scripts …

a7305e8

…to crash

fescobar added a commit that referenced this issue Feb 12, 2020

Merge pull request #61 from fescobar/beta

005f7ef

#58 - Fix - Too many files causes the logging and the shell scripts to crash

fescobar closed this as completed Feb 12, 2020

fescobar added the released Released label May 31, 2020

fescobar mentioned this issue Dec 1, 2020

clean_results doesn't work with a large amount of results #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many files causes the logging and the shell scripts to crash #58

Too many files causes the logging and the shell scripts to crash #58

nishant-vashisth commented Jan 23, 2020 •

edited

fescobar commented Jan 23, 2020

nishant-vashisth commented Jan 23, 2020

fescobar commented Jan 25, 2020

nishant-vashisth commented Jan 27, 2020

fescobar commented Feb 5, 2020

fescobar commented Feb 5, 2020

nishant-vashisth commented Feb 6, 2020

fescobar commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

fescobar commented Feb 6, 2020 •

edited

fescobar commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

fescobar commented Feb 6, 2020

fescobar commented Feb 6, 2020 •

edited

fescobar commented Feb 6, 2020 •

edited

nishant-vashisth commented Feb 7, 2020

fescobar commented Feb 7, 2020 •

edited

fescobar commented Feb 12, 2020 •

edited

nishant-vashisth commented Feb 12, 2020 •

edited

fescobar commented Feb 12, 2020

fescobar commented Feb 12, 2020

Too many files causes the logging and the shell scripts to crash #58

Too many files causes the logging and the shell scripts to crash #58

Comments

nishant-vashisth commented Jan 23, 2020 • edited

fescobar commented Jan 23, 2020

nishant-vashisth commented Jan 23, 2020

fescobar commented Jan 25, 2020

nishant-vashisth commented Jan 27, 2020

fescobar commented Feb 5, 2020

fescobar commented Feb 5, 2020

nishant-vashisth commented Feb 6, 2020

fescobar commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

fescobar commented Feb 6, 2020 • edited

fescobar commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

nishant-vashisth commented Feb 6, 2020

fescobar commented Feb 6, 2020

fescobar commented Feb 6, 2020 • edited

fescobar commented Feb 6, 2020 • edited

nishant-vashisth commented Feb 7, 2020

fescobar commented Feb 7, 2020 • edited

fescobar commented Feb 12, 2020 • edited

nishant-vashisth commented Feb 12, 2020 • edited

fescobar commented Feb 12, 2020

fescobar commented Feb 12, 2020

nishant-vashisth commented Jan 23, 2020 •

edited

fescobar commented Feb 6, 2020 •

edited

fescobar commented Feb 6, 2020 •

edited

fescobar commented Feb 6, 2020 •

edited

fescobar commented Feb 7, 2020 •

edited

fescobar commented Feb 12, 2020 •

edited

nishant-vashisth commented Feb 12, 2020 •

edited