Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action fails when too many jobs trying to track different repos in the same data repo #11

Closed
ChameleonTartu opened this issue Apr 11, 2021 · 10 comments

Comments

@ChameleonTartu
Copy link

This project looks amazing!

My idea was to track all public repos and analyze them once in a while. It looks like when I have too many jobs running, the action fails. For instance, when one job is pushed before another one. My GitHub repo.

Also, there is another issue with amazon-mws-subscriptions-maven:

210411-19:09:08.177 INFO:MainThread: union-merge views and clones
Traceback (most recent call last):
  File "/fetch.py", line 314, in <module>
    main()
  File "/fetch.py", line 73, in main
    ) = fetch_all_traffic_api_endpoints(repo)
  File "/fetch.py", line 122, in fetch_all_traffic_api_endpoints
    df_views_clones = pd.concat([df_clones, df_views], axis=1, join="outer")
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 285, in concat
    op = _Concatenator(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 467, in __init__
    self.new_axes = self._get_new_axes()
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 537, in _get_new_axes
    return [
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 538, in <listcomp>
    self._get_concat_axis() if i == self.bm_axis else self._get_comb_axis(i)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 544, in _get_comb_axis
    return get_objs_combined_axis(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 92, in get_objs_combined_axis
    return _get_combined_index(obs_idxes, intersect=intersect, sort=sort, copy=copy)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 145, in _get_combined_index
    index = union_indexes(indexes, sort=sort)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/api.py", line 214, in union_indexes
    return result.union_many(indexes[1:])
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 395, in union_many
    this, other = this._maybe_utc_convert(other)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 413, in _maybe_utc_convert
    raise TypeError("Cannot join tz-naive with tz-aware DatetimeIndex")
TypeError: Cannot join tz-naive with tz-aware DatetimeIndex

Another data frame issue:

210411-19:09:18.943 INFO: parsed timestamp from path: 2021-04-11 19:09:15+00:00
Traceback (most recent call last):
  File "/analyze.py", line 1398, in <module>
    main()
  File "/analyze.py", line 82, in main
    analyse_view_clones_ts_fragments()
  File "/analyze.py", line 691, in analyse_view_clones_ts_fragments
    if df.index.max() > snapshot_time:
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'
+ ANALYZE_ECODE=1
error: analyze.py returned with code 1 -- exit.

Git clone issue:

GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.

All other issues are the same as those mentioned.

@ChameleonTartu
Copy link
Author

@jgehrcke Let me know if I can help more than just reporting this. It would be great to fix all of this, to use this tool more extensively, as I am planning to grow the number of repos from 34 to more over time. It is the most valuable tool, I could find for tracking repo development over time. Thank you again!

@jgehrcke
Copy link
Owner

jgehrcke commented Apr 14, 2021

Traceback (most recent call last):
  File "/fetch.py", line 314, in <module>
    main()
  File "/fetch.py", line 73, in main
    ) = fetch_all_traffic_api_endpoints(repo)
  File "/fetch.py", line 122, in fetch_all_traffic_api_endpoints
    df_views_clones = pd.concat([df_clones, df_views], axis=1, join="outer")
[...]
TypeError: Cannot join tz-naive with tz-aware DatetimeIndex

I could not quite make sense of this one. Both, df_clones and df_views are created by the same code path. I thought maybe when one of both is empty this might be the fallout with a misleading error, but no:

± python
iPython 3.8.6 (default, Nov 22 2020, 17:14:35) 
[GCC 10.2.1 20201016 (Red Hat 10.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> tz_naive = pd.date_range('2018-03-01 09:00', periods=3)
>>> tz_aware = tz_naive.tz_localize(tz='US/Eastern')
>>> df_aware = pd.DataFrame(data={'lol': [1, 2, 3]}, index=tz_aware)
>>> df_aware
                           lol
2018-03-01 09:00:00-05:00    1
2018-03-02 09:00:00-05:00    2
2018-03-03 09:00:00-05:00    3
>>> df_empty = pd.DataFrame(data={}, index=[])
>>> pd.concat([df_aware, df_empty], axis=1, join="outer")
                           lol
2018-03-01 09:00:00-05:00    1
2018-03-02 09:00:00-05:00    2
2018-03-03 09:00:00-05:00    3

I am adding a patch that changes the way the DatetimeIndex is translated to a tz-aware object, which hopefully addresses this problem. It's a little disappointing to not understand it precisely.

@jgehrcke
Copy link
Owner

TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'

That somewhat suggests that df_clones and df_views looked rather differently structurally than what's expected.

Update: empty index explains that error msg:

>>> df_empty.index.max() > datetime(year=2012, month=3, day=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'float' and 'datetime.datetime'

@jgehrcke
Copy link
Owner

GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.

Could it be that this token was actually truncated and/or maybe this is related to one of your code changes?

I notice secrets.ACCESS_GITHUB_API_TOKEN but with current code this should actually look very differently:

git clone https://ghactions:${GHRS_GITHUB_API_TOKEN}@github.com/${DATA_REPOSPEC}.git 

When things work as expected, that should be the log pattern:

GHRS entrypoint.sh: pwd: /github/workspace
+ git clone ***github.com/jgehrcke/ghrs-test.git .
length of API TOKEN: 40
Cloning into '.'...

It's likely that the error message fatal: Too many arguments. was as of the misconstructed git clone ... command.

jgehrcke added a commit that referenced this issue Apr 14, 2021
See issue #11.

This explicit index construction will create a
tz-aware DatetimeIndex even if the data is empty:


>>> pd.DatetimeIndex(data=[], tz="UTC")
DatetimeIndex([], dtype='datetime64[ns, UTC]', freq=None)
jgehrcke added a commit that referenced this issue Apr 14, 2021
Something fishy went on in the context of
what was reported in #11.
@jgehrcke
Copy link
Owner

@ChameleonTartu would you mind retrying things with the current head of main? I think I've addressed all issued reported to date (maybe have a look at the changelog). Happy to cut a release, but ideally only after getting your confirmation that things indeed work.

@ChameleonTartu
Copy link
Author

@jgehrcke I made a run: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/actions/runs/748508227

The only use-case that doesn't work is:

GHRS entrypoint.sh: pwd: /github/workspace
+ git clone 'https://ghactions:${' secrets.ACCESS_GITHUB_API_TOKEN '}@github.com/ChameleonTartu/buymeacoffee-repo-stats.git' .
length of API TOKEN: 36
fatal: Too many arguments.

And all jobs failed with the same message: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/runs/2343584927?check_suite_focus=true

I suspect that repos may have been created a long time ago, so they have different API token formats, can it be the cause? Any idea?

@jgehrcke
Copy link
Owner

jgehrcke commented Apr 14, 2021

The only use-case that doesn't work is:

OK, you're workflow file is bad in a subtle way! Mean trap: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/blob/b6d089f2bc01462e05fe8100ce1f27cfd3a24909/.github/workflows/stats.yml#L138

@ChameleonTartu you have ghtoken: ${ secrets.ACCESS_GITHUB_API_TOKEN }, but these curly braces need to be pairs of them: ${{ ... }} -- in most jobs, you have that.

@ChameleonTartu
Copy link
Author

@jgehrcke Thank you! I didn't notice these nuances.

I auto-generated some of the jobs, so it looks like I got some of them wrong. Cool-cool-cool!

@jgehrcke
Copy link
Owner

@ChameleonTartu ok : ) Please leave feedback again when the current head of main worked for all your jobs : )

@ChameleonTartu
Copy link
Author

@jgehrcke Everything works smoothly: https://github.com/ChameleonTartu/buymeacoffee-repo-stats/actions/runs/748788117

Amazing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants