Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEB: Deployment of the new website in production #28528

Closed
datapythonista opened this issue Sep 19, 2019 · 16 comments · Fixed by #33341
Closed

WEB: Deployment of the new website in production #28528

datapythonista opened this issue Sep 19, 2019 · 16 comments · Fixed by #33341
Labels
Needs Discussion Requires discussion from core team before further action Web pandas website

Comments

@datapythonista
Copy link
Member

datapythonista commented Sep 19, 2019

Until now, we've been deploying the website manually afaik, from the separate repo.

What I would do with the new website is next (open to discussion):

  • Have a job running to automatically deploy the website from master, possible frequencies:
    • Every commit
    • Daily
    • Weekly
  • Build the website normally, as we do in https://github.com/pandas-dev/pandas/blob/master/azure-pipelines.yml#L122
  • Deploy it to the server with rsync, something like: rsync avz --delete --exclude="docs" ~/web/build pandas_ci@pandas.io:/var/www/pandas

For the docs, I see two main options:

  • Inside docs/ have a directory per language (currently English), and inside them one per version (with a symlink stable/ -> 0.25.1), so the urls would be something like pandas.io/docs/en/stable or pandas.io/docs/en/0.24.0 (and redirect pandas.io/docs/ to pandas.io/docs/en/stable)
  • Copy the docs directly in pandas.io/docs/ and keep the old version in directories there pandas.io/docs/0.24.0/

In both cases, we can deploy the master docs into a dev/ directory together with the versions.

I think the first option is a bit simpler to maintain, and the second makes the url a bit simpler. I don't have a strong preference.

I think this is very simple, but requires that all pages that we don't want to version are in the web, and not in the docs. So, if the roadmap is in the docs, the version we will have will be the stable version (0.25.1) and not master.

@jorisvandenbossche you proposed to have the next pages in the docs:

  • Roadmap
  • Ecosystem
  • Contributing to pandas

Do you have a proposal for how to serve or deploy these? Or are you ok having the stable version of those? I thought about it, and I couldn't find any option I liked to keep those in the docs, that's why I'm proposing to move them to the website.

@datapythonista datapythonista added Needs Discussion Requires discussion from core team before further action Web pandas website labels Sep 19, 2019
@WillAyd
Copy link
Member

WillAyd commented Sep 19, 2019

Why do we want to deploy daily from master? I feel like that might add a maintenance burden for not a lot of gain

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 19, 2019

Why do we want to deploy daily from master? I feel like that might add a maintenance burden for not a lot of gain

Currenlty, Tom builds the website manually and uploads it, which means even more burden (for him) + it often does not happen directly after a change.
So automating this makes sense to me?

@jorisvandenbossche
Copy link
Member

@datapythonista in the daily deploy, this can also include the dev docs? (similar to a version: pandas.io/docs/en/dev/)

@WillAyd
Copy link
Member

WillAyd commented Sep 19, 2019

I'm mostly concerned about the frequency. Pushing daily changes to a live production documentation system seems like an easy way for mistakes in our development process to impact people trying to use the documentation in the day to day

@datapythonista
Copy link
Member Author

@datapythonista in the daily deploy, this can also include the dev docs? (similar to a version: pandas.io/docs/en/dev/)

Forgot to add it, but that's surely an option, and I'm happy about it.

I'm mostly concerned about the frequency.

I'm ok with weekly too. But couple of comments to consider:

  • Personally, if I change something to the website, I'd like to see it life not long after.
  • The stable documentation will only be updated when releasing, so shouldn't be affected
  • I think rsync should be quite harmless, if something is broken, it simply won't update
  • If we deploy weekly, when we discover that something wrong has been deployed, we will probably fix it manually. With daily deployments, I think we can just merge the fix into master, and wait.

@jorisvandenbossche
Copy link
Member

I'm mostly concerned about the frequency.

We will typically not change the website very often (at least not daily), but if we do, we want to have it live quickly (eg for announcing a new release).
So to have it practical, that means that the automatic update should be done either on each commit (if the web sources are touched, which can be less than daily) or on a regular basis such as daily.

@TomAugspurger
Copy link
Contributor

+1 for continuously deploying it whenever the web stuff is touched.

@datapythonista
Copy link
Member Author

+1 for continuously deploying it whenever the web stuff is touched.

After a second thought, also +1 on this (immediate fixes, and the CI will be simpler with a single job for all). And if for whatever reason this causes troubles, it should be very easy to change the frequency.

Updated the description to show the different options.

@jorisvandenbossche
Copy link
Member

One problem with this "whenever web stuff is touched" is that this heuristic does not work for the dev docs. Currently (for github pages) you first copy website and docs to a single directory, and then rsync both together. But I suppose they can also be rsynced separately? (and then the dev docs on each commit? that's maybe a bit too much?)

@TomAugspurger
Copy link
Contributor

Let's just deploy each commit then :)

@datapythonista
Copy link
Member Author

I'd run the rsync after every merge to master, we're building the web and docs already. So it's just adding the rsync command, that in most cases won't have much to synchronize.

I think I don't get what you have in mind. Does what I propose make sense?

@jorisvandenbossche
Copy link
Member

Not sure it fits on this issue, but it has to do with deployment of the docs, so posting here (but can also open a new issue).

Now we are starting to use the new deployed web/docs publicly, we still need to decide on a url scheme.
(for example, Tom used links to https://dev.pandas.io/docs/ in the rc announcement, and Marc used https://pandas.io, which has links to https://pandas.io/docs/ on twitter. While I think long term none of both links to the docs are meant to stay?)

I think there are two parts (base url vs doc url scheme), and so the first question is:

Which base url do we use? (pandas.pydata.org vs pandas.io)

  • I am not sure we actually ever discussed this, apart from some varying assumptions (eg I assumed we would use the new one, and I think Tom assumed we would keep the old). So let's make this more explicit.
  • Technical question: if we would like to keep pandas.pydata.org, is this actually possible combined with the new hosting?
  • If we go with the new one (pandas.io), how do we deal with all existing links online?
    • For older versions of the docs (the ones linking to eg https://pandas.pydata.org/pandas-docs/version/0.25/), we can probably keep alive? (but how long?)
    • For the links to "stable", we need redirects? And the same for the pandas.pydata.org website? Is this possible?

And a second question (although only applicable if we go with a new base url): what url scheme for the docs to use?
I assume we want something like docs/stable/, docs/dev/ (instead of the docs/ right now), and then for new releases also add the docs/version/x.x/ pattern? Although for this last one, we might want to drop the "version" in it and just go with docs/x.x/ for shorter urls.

@jorisvandenbossche
Copy link
Member

what url scheme for the docs to use?

Ah, and I see now that @datapythonista already mentioned this above, and additionally had the idea to already be future proof and add a /en/ to it for English.

@TomAugspurger
Copy link
Contributor

for example, Tom used links to https://dev.pandas.io/docs/ in the rc announcement,

Whoops, I meant to use https://pandas.pydata.org/pandas-docs/version/1.0.0/ for those. Oh well.

Which base url do we use?

Slight preference for pandas.pydata.org, just since there's so much material referencing that. We would of course redirect if we made the move, so this isn't a huge deal.

if we would like to keep pandas.pydata.org, is this actually possible combined with the new hosting?

@aterrel can hopefully answer that.

@datapythonista
Copy link
Member Author

The domains and the hostings are independent, we can use old server with pandas.io, and pandas.pydata.org in the new hosting. I think there was some discussion about using pandas.io when Wes proposed it, and people was happy with it. May be I just assumed it, in any case good to discuss it if anyone has anything to discuss.

I'd have pandas.io/docs/ for the docs, being a symlink to the latest version (so, in the server we could have /docs/v1.0.0/en/.../ (or whatever), but for the users navigating I'd simply use /docs/.

I think all agree that whatever we do, old urls should redirect to the new equivalent urls. I don't think that should be complex, I'd probably redirect pandas.pydata.org to the new server, and configure http headers with the redirects in the load balancer.

@WillAyd
Copy link
Member

WillAyd commented Jan 15, 2020

Also a slight preference for pandas.pydata.org for same reasons prescribed by Tom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Requires discussion from core team before further action Web pandas website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants