Skip to content

Conversation

@bsipocz
Copy link
Member

@bsipocz bsipocz commented Nov 6, 2025

No description provided.

@bsipocz bsipocz added the infrastructure Issues relevant to infrasructure, rather than content label Nov 6, 2025
@bsipocz bsipocz force-pushed the ENH_switch_to_JB2 branch 2 times, most recently from c4ac7a7 to 627aafd Compare November 6, 2025 16:35
@bsipocz bsipocz closed this Nov 7, 2025
@bsipocz bsipocz reopened this Nov 7, 2025
@bsipocz
Copy link
Member Author

bsipocz commented Nov 7, 2025

Well, apparently having the make file would make sense so we run the same in conda and in tox.

OTOH, do we really still need conda, or should nuke it and redo env management in pixi instead?

@rossbar
Copy link
Collaborator

rossbar commented Nov 7, 2025

OTOH, do we really still need conda, or should nuke it and redo env management in pixi instead?

👍 this is my thought too - especially since the environment resolution takes longer than the notebook execution 🙃 . There exists tooling for this exact problem now so let's transition to it.

I'd propose to remove the conda job in this PR then re-do/beef up CI after updating tooling.

@bsipocz
Copy link
Member Author

bsipocz commented Nov 7, 2025

We're still seeing issues with the preview, so this should not be merged until that is solved.

@stefanv
Copy link
Contributor

stefanv commented Nov 8, 2025

We're still seeing issues with the preview, so this should not be merged until that is solved.

How do you pass the BASE_URL to tox? It may need something like passenv.

@bsipocz
Copy link
Member Author

bsipocz commented Nov 8, 2025

I'm not sure what we consider blocking for this PR, there are some issues I see. For anything that is a follow-up we may want to open an issue so we don't forget them.

  • build time on CircleCI is significantly longer than it was with JB1. I could easily consider this as a follow-up though.
  • cleaning up build errors and warnings -- currently we don't run with strict (and we didn't run with a similar manner as we were not failing JB1 builds with these broken links either)
  • deployment not to a gh-pages branch -- this could probably be done as a follow-up, too and now just use the file we have. (I need to make my other repos consistent, too so can do this repo along with those).
  • cleanup environment.yaml, switch to a more modern and reliable solution (eyeing pixi)

@stefanv
Copy link
Contributor

stefanv commented Nov 8, 2025

  • build time on CircleCI is significantly longer than it was with JB1. I could easily consider this as a follow-up though.

Do you have a sense of why this is? Parallel execution of notebooks, setup time, etc.?

@stefanv
Copy link
Contributor

stefanv commented Nov 8, 2025

Last build failed with timeout :/

@bsipocz
Copy link
Member Author

bsipocz commented Nov 8, 2025

build time on CircleCI is significantly longer than it was with JB1. I could easily consider this as a follow-up though.

Do you have a sense of why this is? Parallel execution of notebooks, setup time, etc.?

Unfortunately I don't have any good guesses. My experience with the IRSA notebooks were that the build is faster, but in reality those are the worst benchmarks ever as all of them are about querying data on remote servers and that has a genuine huge scatter in runtime (and gazillion timeouts that are due to the server side).

Copy link
Member Author

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is working now.

I would need decisions on the checkpoints listed in the comment above, do we want one big giant PR with everything or rolling it out with follow-ups. And either case, will need a review and approval (btw, how do you feel about requiring approvals in this repo -- I could argue both ways as I tend to do quick infrastructure fixes that needs to go asap, but even though could benefit from being forced to be looked at by someone else)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the conda job has been removed this file is not being tested to work anywhere.

@rossbar - do you have preferences of how to progress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would nuke the mentions of it, but again I wasn't the conda user. I would think we should provide an env that works ™️ and passes the test, but this one didn't so we may want to jump ships and recommend pixi anyways.

OTOH, I would block the merging of this on the condition that the launch in binder or somewhere else should work, so provide at least one way to users to try the content without mandating them to fix their local installs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd punt on it for now; i.e. leave the environment.yml (untested) and work on switching to better tooling in followups.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added it to the follow-up list (to be lifted into a separate issue)

@rossbar
Copy link
Collaborator

rossbar commented Nov 10, 2025

build time on CircleCI is significantly longer than it was with JB1. I could easily consider this as a follow-up though.

Is the build picking up the "non-executable" articles, i.e. the RL and NLP ones? That would be my guess - we were skipping the execution with JB1 stack, but that was done in conf.py which is no longer used.

cleaning up build errors and warnings -- currently we don't run with strict (and we didn't run with a similar manner as we were not failing JB1 builds with these broken links either)

I think this should be a follow-up: for the most part these seem like obvious improvements (suppressing returned text, fixing missing links) but I haven't looked at the whole list. Either way, this shouldn't be a blocker IMO!

deployment not to a gh-pages branch -- this could probably be done as a follow-up, too and now just use the file we have. (I need to make my other repos consistent, too so can do this repo along with those).

+1 for follow-up here too, since my eagerness to move it into #271 broke the deployed site 🙃 . It'd be good to figure out the build timeout issues and have the execution bit fully stable before setting up the auto-deploy IMO.

@bsipocz
Copy link
Member Author

bsipocz commented Nov 10, 2025

The timeout was a one off, so I wouldn't yet worry about it.
And you may be right about picking up more notebooks. Shall we ignore them? The build is not too long, so we can have everything we currently run.

@rossbar
Copy link
Collaborator

rossbar commented Nov 10, 2025

And you may be right about picking up more notebooks. Shall we ignore them? The build is not too long, so we can have everything we currently run.

Yes definitely - these two in particular (RL and NLP) were never executable to begin with and NLP in particular has (IIRC) a very long runtime for what little is implemented!

@bsipocz
Copy link
Member Author

bsipocz commented Nov 10, 2025

I'll be offline tomorrow, but will pick this up once I'm done with the driving.

@rossbar
Copy link
Collaborator

rossbar commented Nov 11, 2025

The timeout was a one off, so I wouldn't yet worry about it.

I now suspect it's related to threading on circleCI again - we solved this problem but if the env vars are not being picked up in the tox environment then this would cause issues and unreliable execution times. Take a look at #276 and LMK if that pattern is the right one for configuring (and see if it shouldn't be added here too!)

@bsipocz
Copy link
Member Author

bsipocz commented Nov 11, 2025

Yes, I'll rebase this to pick up that change. Tox not passing along env variables is a security feature rather than a bug (it annoys me occasionally, too)

@bsipocz
Copy link
Member Author

bsipocz commented Nov 12, 2025

Yes definitely - these two in particular (RL and NLP) were never executable to begin with and NLP in particular has (IIRC) a very long runtime for what little is implemented!

Unfortunately it's not these two causing problems, they were not even added to the rendering as they were originally in the "Article" category and we said we will axe that.

@bsipocz
Copy link
Member Author

bsipocz commented Nov 12, 2025

Good news is that cache is really nicely picked up, so now I cannot really give you new timings.

@rossbar
Copy link
Collaborator

rossbar commented Nov 12, 2025

Good news is that cache is really nicely picked up, so now I cannot really give you new timings.

I'm fairly confident the OMP_NUM_THREADS was the primary culprit here. Further evidence that this is the case is that the problem is only on CI (building locally is fine).

@bsipocz
Copy link
Member Author

bsipocz commented Nov 12, 2025

I'm fairly confident the OMP_NUM_THREADS was the primary culprit here. Further evidence that this is the case is that the problem is only on CI (building locally is fine).

Great! Do you see anything else then?

Copy link
Collaborator

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think the critical stuff is in place - I'm ready to pull the trigger! Thanks @bsipocz for spearheading this!

I'd like to make sure @melissawm has a chance to look things over or at least voice any objections/concerns about switching over to the JB2-style infrastructure!

@melissawm
Copy link
Member

I am 100% in agreement but you never need to wait for me - ship it! 🚀

@bsipocz
Copy link
Member Author

bsipocz commented Nov 13, 2025

I am 100% in agreement but you never need to wait for me - ship it! 🚀

OK, let's do it then.
I'm at a meeting/conference this week, so will be either super quick or be super delayed in fixing up stuff. I suspect we may now introduce some link breakage downstream, but I'm not 100% certain anyone was linking to individual pages, etc. So keep an eye out for those reports.

@bsipocz bsipocz merged commit 40e2850 into numpy:main Nov 13, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Issues relevant to infrasructure, rather than content

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants