Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many files causes the logging and the shell scripts to crash #58

Closed
nishant-vashisth opened this issue Jan 23, 2020 · 23 comments
Closed
Labels
bug Something isn't working enhancement Enhancement released Released

Comments

@nishant-vashisth
Copy link

nishant-vashisth commented Jan 23, 2020

When there are too many files (Over 50K, generated by 3000 tests in 5 - 10 runs) in the allure-results folder, the clean-results and clean-history APIs fail

Cleaning results
/app/cleanAllureResults.sh: line 5: /usr/bin/find: Argument list too long

I think it's unwise to expect even linux to run through these many files, but is there any way this can be sorted ?

Is it possible that the files can be batched into multiple folders?

Secondly, as the number of files increase in the folder, the response logging with list of file names becomes overbearing.
The web services have to transfer too much data to return a list of 10,000 plus filenames as response.

If this needs to be returned, there should be an env variable controlling the verbose response.
Will it make sense to only reply the count of the files present ? Which then falls back to the issue mentioned above listing so many files almost crashes the python process with memory overflow.

@fescobar
Copy link
Owner

@nishantvas ok, let me reproduce that and think what will be the best solution. Thanks

@nishant-vashisth
Copy link
Author

Great, let me know if I can provide more details.
For now, I simply clean the results before I start publishing results from any new runs.
So there are at most 8 - 9 k files max at a given time

This way, I keep the history but loose the individual run details.

@fescobar fescobar added bug Something isn't working enhancement Enhancement and removed bug Something isn't working labels Jan 23, 2020
@fescobar
Copy link
Owner

@nishantvas Can you attach the full log from the container, please?

@nishant-vashisth
Copy link
Author

Not checking results automatically
ALLURE_VERSION: 2.13.0
Generating default report
Generating report

  • Serving Flask app "app" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
  • Running on http://0.0.0.0:5050/ (Press CTRL+C to quit)
    10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/ HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/swagger-ui.css HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/swagger-ui-bundle.js HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/swagger-ui-standalone-preset.js HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /static/swagger.json HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:14:41] "GET /swagger/favicon-32x32.png HTTP/1.1" 200 -
    Report successfully generated to allure-report
    127.0.0.1 - - [22/Jan/2020 12:14:56] "GET /emailable-report/render HTTP/1.1" 200 -
    Status: 200
    Starting web server...
    2020-01-22 12:14:57.575:INFO::main: Logging initialized @750ms to org.eclipse.jetty.util.log.StdErrLog
    Can not open browser because this capability is not supported on your platform. You can use the link below to open the report manually.
    Server started at http://10.19.0.13:4040/. Press <Ctrl+C> to exit
    Cleaning results
    /app/cleanAllureResults.sh: line 5: /usr/bin/find: Argument list too long
    Creating history on results directory...
    Copying history from previous results...
    Generating report
    Report successfully generated to allure-report
    10.19.0.65 - - [22/Jan/2020 12:16:14] "GET /clean-results HTTP/1.1" 400 -
    127.0.0.1 - - [22/Jan/2020 12:16:46] "GET /emailable-report/render HTTP/1.1" 200 -
    Status: 200
    10.19.0.65 - - [22/Jan/2020 12:16:46] "GET /clean-results HTTP/1.1" 200 -
    Cleaning results
    Creating history on results directory...
    Copying history from previous results...
    Generating report
    Report successfully generated to allure-report
    127.0.0.1 - - [22/Jan/2020 12:17:08] "GET /emailable-report/render HTTP/1.1" 200 -
    Status: 200
    10.19.0.65 - - [22/Jan/2020 12:17:08] "GET /clean-results HTTP/1.1" 200 -
    Cleaning history
    Creating history on results directory...
    Copying history from previous results...
    Generating report
    Report successfully generated to allure-report
    127.0.0.1 - - [22/Jan/2020 12:17:36] "GET /emailable-report/render HTTP/1.1" 200 -
    Status: 200
    10.19.0.65 - - [22/Jan/2020 12:17:36] "GET /clean-history HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:31:40] "GET /version HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:00] "GET /version HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:01] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:01] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:02] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:03] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:03] "POST /send-results HTTP/1.1" 200 -
    10.19.0.65 - - [22/Jan/2020 12:32:03] "POST /send-results HTTP/1.1" 200 -

@fescobar
Copy link
Owner

fescobar commented Feb 5, 2020

@nishantvas I think I have an idea. Working on this.

@fescobar
Copy link
Owner

fescobar commented Feb 5, 2020

@nishantvas I've released a beta version recently. In there I did some changes to support more files. Please, can you check your case with multiple files using this version image: "frankescobar/allure-docker-service:beta"? (remove all your local allure images just in case). Also you can use a new env variable to reduce the verbose from API response.

    environment:
      API_RESPONSE_LESS_VERBOSE: 1

Please, let me know if this version can resolve your problem or not. If it fails, attach a log, please.

@nishant-vashisth
Copy link
Author

Sure, I'll deploy this and let you know If I can reproduce the issue.
However, would you happen to know how many files this can approximately handle ?

The way our tests are run, you can be sure that it can increase 100K files in less than couple of days.
There must be some limit

@fescobar
Copy link
Owner

fescobar commented Feb 6, 2020

@nishantvas I don’t know what is the limit but now I’m not finding the files to delete. Now I’m storing the history folder in a temporal directory and I’m deleting everything when you clean the results. If you can tell me what is the size of the history directory in allure-results when you have 100k results it would be better. The same check if there is a performance problem when you clean the results.
I’m no able to reproduce your scenario with real data.

@nishant-vashisth
Copy link
Author

@fescobar, this does work for me and setting the env var helps a bit, but creating that many files will take some iterations to run

Can I suggest some changes in the code ?
It can be made relatively faster with some tweaks

@fescobar
Copy link
Owner

fescobar commented Feb 6, 2020

@nishantvas of course, can you create a pull request from beta branch? or tell me what do you think? thanks

@fescobar
Copy link
Owner

fescobar commented Feb 6, 2020

@nishantvas Maybe you have to use 15 or NONE but you have to execute generate-report endpoint after amount cert of time.

environment:
      CHECK_RESULTS_EVERY_SECONDS: NONE
environment:
      CHECK_RESULTS_EVERY_SECONDS: 15

https://github.com/fescobar/allure-docker-service#updating-seconds-to-check-allure-results

@nishant-vashisth
Copy link
Author

@nishantvas of course, can you create a pull request from beta branch? or tell me what do you think? thanks

Couple of code smells and better python practices, but it depends if you are intending to stream line the python code present in the app.py or if you've left it because it's essentially a wrapper on the actual allure commands being called via .sh files

I can create a PR by "productionalizing" the code if you say so.

Apart from that, one thing which'll matter, in generate_report()
Don't call files = os.listdir(RESULTS_DIRECTORY) if API_RESPONSE_LESS_VERBOSE is requested

Since it can have over 200K files in my case, this will take too much memory and too much time

And in send_results the same loop can be utilized to speed up API by some level by saving on memory of keeping the extra lists which are not required if API_RESPONSE_LESS_VERBOSE is requested

@nishant-vashisth
Copy link
Author

@nishantvas Maybe you have to use 15 or NONE but you have to execute generate-report endpoint after amount cert of time.

For now, I use it as NONE, since I can't have the service running the generate report on it's own since it's a very expensive task with these many files.

@nishant-vashisth
Copy link
Author

There is an issue with the build..
It's probably since you're pasting the history on the root folder, although docker images have root access, not sure what's wrong here

ALLURE_VERSION: 2.13.1
Generating default report
Not checking results automatically
Generating report

  • Serving Flask app "app" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
  • Running on http://0.0.0.0:5050/ (Press CTRL+C to quit)
    Report successfully generated to allure-report
    127.0.0.1 - - [06/Feb/2020 10:29:33] "GET /emailable-report/render HTTP/1.1" 200 -
    Status: 200
    Starting web server...
    2020-02-06 10:29:34.910:INFO::main: Logging initialized @1278ms to org.eclipse.jetty.util.log.StdErrLog
    Can not open browser because this capability is not supported on your platform. You can use the link below to open the report manually.
    Server started at http://10.19.0.26:4040/. Press <Ctrl+C> to exit
    10.19.0.65 - - [06/Feb/2020 10:35:46] "GET /version HTTP/1.1" 200 -
    10.19.0.65 - - [06/Feb/2020 10:36:10] "GET /version HTTP/1.1" 200 -
    Creating history on results directory...
    Copying history from previous results...
    cp: cannot create regular file '/app/allure-results/history/./retry-trend.json': Permission denied
    cp: cannot create regular file '/app/allure-results/history/./history.json': Permission denied
    cp: cannot create regular file '/app/allure-results/history/./history-trend.json': Permission denied
    cp: cannot create regular file '/app/allure-results/history/./categories-trend.json': Permission denied
    cp: cannot create regular file '/app/allure-results/history/./duration-trend.json': Permission denied
    cp: preserving times for '/app/allure-results/history/.': Operation not permitted
    Generating report
    Report successfully generated to allure-report
    127.0.0.1 - - [06/Feb/2020 10:36:45] "GET /emailable-report/render HTTP/1.1" 200 -
    Status: 200
    10.19.0.65 - - [06/Feb/2020 10:36:45] "GET /generate-report HTTP/1.1" 200 -

@fescobar
Copy link
Owner

fescobar commented Feb 6, 2020

@nishantvas Ok, let me resolve that. After that, I will let you do some changes to improve that.

@fescobar
Copy link
Owner

fescobar commented Feb 6, 2020

@nishantvas How can I reproduce this?
#58 (comment)
Are you sure you are using the beta image?, make sure to remove any other version from this container.

Also verify not running with root user. Use user by default or 1000:1000

@fescobar
Copy link
Owner

fescobar commented Feb 6, 2020

I've re-released beta version again, if you want to try, remove your local beta images and pull again.
I can't reproduce this cp: cannot create regular file '/app/allure-results/history/./retry-trend.json': Permission denied Maybe those files were created with user root using the previous versions of the container and now with the new change you are having this permission problem.
If you want, you can try with user: root

    user: root
    environment:
       ...............

@nishant-vashisth
Copy link
Author

I have this deployed on a kubernetes cluster with a mounted volume, but if you have changed the default user of the container. It is entirely possible that the files and folders previously created with the root user of 2.13.0 version can not be overridden.

I've tried running this in another cluster and locally, and it works.. I'll fiddle with the service and see how I can purge the volumes (they're a bit messy to get into)

@fescobar
Copy link
Owner

fescobar commented Feb 7, 2020

@nishantvas yes, I can reproduce it. That's the problem. I will see what I can do to avoid breaking some volume.

Steps to reproduce:

  • Generate results with 2.13.0
  • Mount same volume with the results from the previous step using beta version
  • Clean results using API

Result:

llure_1  | Cleaning results
allure_1  | ls: cannot access '/app/allure-results/008801a1-772a-4b42-80d0-8d2ae00ba026-attachment.webm': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/300b8cc0-bd65-4305-9e31-18ff278fbb42-attachment.webm': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/5df696b0-c0f8-449d-900f-608b2661ff20-result.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/63d2a611-5ba8-4422-9ac6-fd10a1b87976-result.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/2f0418ca-c5fb-4401-9ab6-55bbe1802d7a-attachment.webm': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/15031441-064e-4e08-8578-f230bcefdc87-container.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/310a758f-596e-47ff-8b54-3d5167a860a5-container.json': No such file or directory
allure_1  | ls: cannot access '/app/allure-results/6e168492-4522-4c50-91a8-c5853c75d7cd-result.json': No such file or directory

Working in the fix.

@fescobar
Copy link
Owner

fescobar commented Feb 12, 2020

@nishantvas the problem wasn't related to permissions, it was related to concurrency. Just you needed it to execute /clean-results endpoint in parallel to reproduce it.

I've implemented another fix to delete all files. I think this is better. I've re-released beta again. Remove your previous beta images.

Can you try with this new version frankescobar/allure-docker-service:beta?
Thank you so much for your patient. I will be waiting for your feedback.

@nishant-vashisth
Copy link
Author

nishant-vashisth commented Feb 12, 2020

Thanks so much, this does seem to be working now.
Once you merge this, can you let me know the new version this has been deployed on ?

@fescobar
Copy link
Owner

@nishantvas I will run some tests and I will deploy it in a few mins. I will let you know.

fescobar added a commit that referenced this issue Feb 12, 2020
 #58 - Fix - Too many files causes the logging and the shell scripts to crash
@fescobar
Copy link
Owner

@nishantvas I've redeployed version 2.13.1 and latest with the fix. Remove your local previous to pull the image overwrote. Thanks for all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement Enhancement released Released
Projects
None yet
Development

No branches or pull requests

2 participants