Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analyze.py error: "max() arg is an empty sequence" #15

Closed
olets opened this issue Apr 15, 2021 · 6 comments
Closed

analyze.py error: "max() arg is an empty sequence" #15

olets opened this issue Apr 15, 2021 · 6 comments

Comments

@olets
Copy link
Contributor

olets commented Apr 15, 2021

The recent updates resolved the errors I was getting in two repos!

It unearthed a new error in one repo. Until yesterday I was getting this error:

Parse data files, perform aggregation and analysis, generate Markdown report and render as HTML
210412-23:45:08.034 INFO: Remove output directory: latest-report
210412-23:45:08.035 INFO: Create output directory: latest-report
210412-23:45:08.869 INFO: generated new fontManager
210412-23:45:09.041 INFO: fetch stargazer time series for repo olets/nitro-zsh-completions
210412-23:45:09.264 INFO: GH request limit before fetch operation: 4849
210412-23:45:09.421 INFO: GH request limit after fetch operation: 4848
210412-23:45:09.421 INFO: http requests made (approximately): 1
210412-23:45:09.422 INFO: stargazer count: 1
210412-23:45:09.467 INFO: stars_cumulative, raw data: time
2021-03-28 22:03:09+00:00    1
Name: stars_cumulative, dtype: int64
210412-23:45:09.468 INFO: len(series): 1
210412-23:45:09.468 INFO: resample series into 1d bins
210412-23:45:09.472 INFO: len(series): 1
210412-23:45:09.472 INFO: stars_cumulative, for CSV file (resampled):                            stars_cumulative
time                                       
2021-03-28 00:00:00+00:00                 1
210412-23:45:09.477 INFO: write aggregate to ghrs-data/views_clones_aggregate.csv
210412-23:45:09.480 INFO: fetch fork time series for repo olets/nitro-zsh-completions
210412-23:45:09.701 INFO: GH request limit before fetch operation: 4847
210412-23:45:09.827 INFO: GH request limit after fetch operation: 4846
210412-23:45:09.827 INFO: http requests made (approximately): 1
210412-23:45:09.827 INFO: current fork count: 0
210412-23:45:09.835 INFO: len(series): 0
210412-23:45:09.835 INFO: resample series into 1d bins
210412-23:45:09.836 INFO: len(series): 0
210412-23:45:09.837 INFO: forks_cumulative, for CSV file (resampled): Empty DataFrame
Columns: [forks_cumulative]
Index: []
210412-23:45:09.837 INFO: write aggregate to ghrs-data/forks.csv
210412-23:45:09.839 INFO: read views/clones time series fragments (CSV docs)
210412-23:45:09.840 INFO: number of CSV files discovered for *_views_clones_series_fragment.csv: 1
210412-23:45:09.840 INFO: attempt to parse ghrs-data/snapshots/2021-04-12_234505_views_clones_series_fragment.csv
210412-23:45:09.841 INFO: parsed timestamp from path: 2021-04-12 23:45:05+00:00
Traceback (most recent call last):
  File "/analyze.py", line 1398, in <module>
    main()
  File "/analyze.py", line 82, in main
    analyse_view_clones_ts_fragments()
  File "/analyze.py", line 691, in analyse_view_clones_ts_fragments
    if df.index.max() > snapshot_time:
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'
+ ANALYZE_ECODE=1
error: analyze.py returned with code 1 -- exit.
+ set -e
+ set +x

Now instead I get

Parse data files, perform aggregation and analysis, generate Markdown report and render as HTML
+ python /analyze.py --resources-directory /resources --output-directory latest-report --outfile-prefix '' --stargazer-ts-resampled-outpath ghrs-data/stargazers.csv --fork-ts-resampled-outpath ghrs-data/forks.csv --views-clones-aggregate-outpath ghrs-data/views_clones_aggregate.csv --views-clones-aggregate-inpath ghrs-data/views_clones_aggregate.csv --delete-ts-fragments olets/nitro-zsh-completions ghrs-data/snapshots
210414-23:42:48.927 INFO: Remove output directory: latest-report
210414-23:42:48.927 INFO: Create output directory: latest-report
210414-23:42:49.691 INFO: generated new fontManager
210414-23:42:49.847 INFO: fetch stargazer time series for repo olets/nitro-zsh-completions
210414-23:42:49.969 INFO: GH request limit before fetch operation: 4900
210414-23:42:50.095 INFO: GH request limit after fetch operation: 4899
210414-23:42:50.096 INFO: http requests made (approximately): 1
210414-23:42:50.096 INFO: stargazer count: 1
210414-23:42:50.102 INFO: stars_cumulative, raw data: time
2021-03-28 22:03:09+00:00    1
Name: stars_cumulative, dtype: int64
210414-23:42:50.103 INFO: len(series): 1
210414-23:42:50.103 INFO: resample series into 1d bins
210414-23:42:50.107 INFO: len(series): 1
210414-23:42:50.108 INFO: stars_cumulative, for CSV file (resampled):                            stars_cumulative
time                                       
2021-03-28 00:00:00+00:00                 1
210414-23:42:50.112 INFO: write aggregate to ghrs-data/views_clones_aggregate.csv
210414-23:42:50.115 INFO: fetch fork time series for repo olets/nitro-zsh-completions
210414-23:42:50.743 INFO: GH request limit before fetch operation: 4898
210414-23:42:51.145 INFO: GH request limit after fetch operation: 4897
210414-23:42:51.145 INFO: http requests made (approximately): 1
210414-23:42:51.145 INFO: current fork count: 0
210414-23:42:51.148 INFO: len(series): 0
210414-23:42:51.148 INFO: resample series into 1d bins
210414-23:42:51.149 INFO: len(series): 0
210414-23:42:51.150 INFO: forks_cumulative, for CSV file (resampled): Empty DataFrame
Columns: [forks_cumulative]
Index: []
210414-23:42:51.150 INFO: write aggregate to ghrs-data/forks.csv
210414-23:42:51.154 INFO: read views/clones time series fragments (CSV docs)
210414-23:42:51.154 INFO: number of CSV files discovered for *_views_clones_series_fragment.csv: 1
210414-23:42:51.154 INFO: attempt to parse ghrs-data/snapshots/2021-04-14_234245_views_clones_series_fragment.csv
210414-23:42:51.155 INFO: parsed timestamp from path: 2021-04-14 23:42:45+00:00
210414-23:42:51.158 WARNING: empty dataframe parsed from ghrs-data/snapshots/2021-04-14_234245_views_clones_series_fragment.csv, skip
210414-23:42:51.158 INFO: total sample count: 0
Traceback (most recent call last):
  File "/analyze.py", line 1409, in <module>
    main()
  File "/analyze.py", line 82, in main
    analyse_view_clones_ts_fragments()
  File "/analyze.py", line 717, in analyse_view_clones_ts_fragments
    newest_snapshot_time = max(df.attrs["snapshot_time"] for df in dfs)
ValueError: max() arg is an empty sequence
+ ANALYZE_ECODE=1
+ set -e
+ set +x
error: analyze.py returned with code 1 -- exit.

I don't use Python much so I haven't looked for the bug :)

I can make you a repo collaborator if that's helpful — I think that would let you run the action on the repo?

@jgehrcke
Copy link
Owner

jgehrcke commented Apr 15, 2021

Should have really waited for your feedback before making that release haha. Will look into that soon, probably not today anymore (the bug is already obvious to me and the fix is quick -- it's just difficult to test these edge cases in advance, so your report is hugely valuable again))!

@jgehrcke
Copy link
Owner

I think that would let you run the action on the repo?

Would appreciate that if that's OK for you! I'll still have to see if I'm courageous enough to actually run that in your repo, but for the testing work of course it saves you and me both some time!

jgehrcke added a commit that referenced this issue Apr 15, 2021
jgehrcke added a commit that referenced this issue Apr 15, 2021
Fix for scenario where all views/clones fragments are empty (#15)
@jgehrcke
Copy link
Owner

Landed #16 which should address the problem shown.

@jgehrcke
Copy link
Owner

jgehrcke commented Apr 16, 2021

Btw @olets if you like to, you can remove empty CSV files from your data repository by just pushing a corresponding commit manually. In your case this is ghrs-data/snapshots/2021-04-14_234245_views_clones_series_fragment.csv. I understand though if you wouldn't care, and you can also leave it in there : ).

What I like here is the concept of being able to manually fix/change/tweak the state of data via a simple git workflow, leaving transparent history of changes. It think that's a conceptual strength of this solution (compared to e.g. having to 'fix' the state of an object in S3 or in a database).

Sorry for the annoyances -- young software, limited testing and all.

@olets
Copy link
Contributor Author

olets commented Apr 16, 2021

should address the problem shown

It works! 🙌

Would appreciate that if that's OK for you!

I'll set it up if there's a problem again 👍 That data repo's project is deprecated. Now that we don't need it for debuggin, I'm going to freeze the repo anyway!

Sorry for the annoyances

Not annoying at all!

@jgehrcke
Copy link
Owner

Thank you for the feedback @olets as always!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants