Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: migrate codebase to paths-based data model #39629

Closed
julieqiu opened this issue Jun 17, 2020 · 219 comments
Closed

x/pkgsite: migrate codebase to paths-based data model #39629

julieqiu opened this issue Jun 17, 2020 · 219 comments
Assignees
Labels

Comments

@julieqiu
Copy link
Contributor

@julieqiu julieqiu commented Jun 17, 2020

As part of the redesign that we have planned for this year, we will be migrating our data model to use a paths table.

Structs and functions to be deprecated will be prefixed with "Legacy".

Related: #39621

TODOs:

  • populate new tables
    • update worker/postgres code to insert new data into new tables (paths, documentation, readmes, and package_imports)
  • add new postgres functions
    • #40032: add GetModuleInfo function to replace LegacyGetModuleInfo
    • update postgres.GetImports to read from package_imports table
    • add function to replace LegacyGetDirectory
    • add GetDirectory function to return data for a path
    • #40150: add a function to get the version history for a path
    • #40027: add GetLicenses function to return licenses for a given path
  • update frontend code:
    • add new servePackagePage code path, such that:
      • doc tab reads from documentation table
      • overview tab reads from readmes table
      • subdirectories tab reads from paths table
      • versions tab reads from paths table
      • imports tab reads from package_imports table
      • licenses tab reads from paths table
    • add new serveModulePage code path, such that:
      • overview tab reads from readmes table
      • packages tab reads from paths table
      • versions tab reads from paths table
      • licenses tab reads from paths table
    • add new serveDirectoryPage code path, such that:
      • overview tab reads from readmes table
      • subdirectories tab reads from paths table
      • versions tab exists, and reads from paths table
      • licenses tab reads from paths table
    • update latest badge to read from paths table
    • update search to read from paths table
    • move SearchResult to the internal package
  • split GetDirectory into multiple queries:
    • add GetDirectoryMeta
    • #41018: GetReadme
    • #41017: GetDocumentation
    • add support for field sets
  • turn on experiment flags
    • InsertDirectories
    • UseDirectories
    • UsePathInfo
    • UsePackageImports
  • remove experiment flags
    • InsertDirectories
    • UseUnits
    • UsePathInfo
    • UsePackageImports
  • replace use of Legacy functions and deprecate
    • LegacyGetDirectory
    • LegacyGetModuleLicenses
    • LegacyGetPackage
    • LegacyGetPackageLicenses
    • LegacyGetPackagesInModule
    • LegacyGetModuleInfo
  • deprecate use of packages:
    1. replace use of packages for search upsert
    • read from paths instead of packages
    • drop packages FK
    • fix tests
    1. do not insert into package table
    • remove from insertModule
    1. remove from fetchModule
    • remove LegacyPackages from fetchModule
  • delete legacy structs
    • LegacyDirectory
    • LegacyPackage
    • Module.LegacyPackages
  • delete old tables / columns
    • table packages (replaced by paths and documentation tables)
    • table imports (replaced by package_imports table)
    • columns modules.readme_file_path, modules.readme_contents (replaced by readmes table)
  • change PKs:
    • change modules PK to modules.id
    • drop licenses.module_path and licenses.version
    • changes licenses PK to (modules.id, filepath)
@julieqiu julieqiu self-assigned this Jun 17, 2020
@gopherbot gopherbot added this to the Unreleased milestone Jun 17, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 1, 2020

Change https://golang.org/cl/240680 mentions this issue: internal/frontend: use paths table to check for existence

gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 1, 2020
A new experiment is added, ExperimentUsePathInfoToCheckExistence, which
uses the paths table to check if a package/module/directory exists,
regardless of whether we are using the old or new data model to render
the pages.

This is a first step to use the new paths-based data model. It also
streamlines the flow for frontend fetches.

Updates golang/go#39629

Change-Id: I6178776264cbe71ffb3fb87b3409df902b33b2d4
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/240680
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 4, 2020

Change https://golang.org/cl/240938 mentions this issue: internal: rename GetImports to LegacyGetImports

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 7, 2020

Change https://golang.org/cl/241181 mentions this issue: internal/postgres: read from package_imports

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 7, 2020

Change https://golang.org/cl/241182 mentions this issue: internal/frontend: use GetPathInfo for search

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 7, 2020

Change https://golang.org/cl/241183 mentions this issue: internal/frontend: use GetPathInfo for latest version

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 7, 2020

Change https://golang.org/cl/241064 mentions this issue: internal/worker: replace GetModuleInfo with LegacyGetModuleInfo

gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 7, 2020
GetImports now reads from the package_imports table when the
"use-package-imports" experiment is on.

Updates golang/go#39629

Change-Id: I80970eb4806c8ccf2c01e5dac6a04ff0978fd681
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241181
Reviewed-by: Jonathan Amsterdam <jba@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 7, 2020
Tests in internal/worker now use GetModuleInfo instead of
LegacyGetModuleInfo.

Updates golang/go#39629

Change-Id: I524750809c3e2dcf795bb026636fb76aeba280db
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241064
Reviewed-by: Jonathan Amsterdam <jba@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 7, 2020
The search endpoint now uses GetPathInfo to check if a package
exists, when the "use-path-info" experiment is on.

Updates golang/go#39629

Change-Id: I5c4b3c3250e06a33a647aac74f91849ceb4105ca
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241182
Reviewed-by: Jonathan Amsterdam <jba@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 7, 2020
LatestVersion now uses GetPathInfo to check for the latest version of a
path, when the "use-path-info" experiment is on.

Updates golang/go#39629

Change-Id: I374f707cb06ccd8147925229eb48f2d08caf0a4f
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241183
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 7, 2020

Change https://golang.org/cl/241320 mentions this issue: internal: delete insert-directories experiment

gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 7, 2020
The insert-directories experiment flag is deleted, since we have already
inserted data for all modules into the paths, package_imports,
documentation, and readmes table, and been running that code path for a
while.

Updates golang/go#39629

Change-Id: I323850a462672c41ad0c67b6ab2b173bb32bf441
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241320
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 8, 2020

Change https://golang.org/cl/241397 mentions this issue: internal/frontend: add serveDetailsPage

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 9, 2020

Change https://golang.org/cl/241441 mentions this issue: internal/frontend: merge createPackage functions

gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 9, 2020
serveDetailsPage is added, which replaces servePackagePageNew in serving
details pages when the "use-directories" experiment flag is on. Like
servePackagePageNew, serveDetailsPages supports the package/directory
views. It also supports the module view, so that we can stop using
LegacyModuleInfo and reading from modules.readme_file_path and
modules.readme_contents.

For golang/go#39629

Change-Id: I8e664bf1e9174a630db6b723c949ba1e9aa0dc9b
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241397
Reviewed-by: Jonathan Amsterdam <jba@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 9, 2020
The two functions for creating a frontend.Package, legacyCreatePackage
and createPackageNew, are merged into a single createPackage function.

createPackage uses the new PackageMeta type, as input. It does not take
any legacy structs as input and returns a frontend.Package.

In future CLs, PackageMeta will be used to LegacyDirectory, when
fetching data for the directories tab.

For golang/go#39629

Change-Id: I80ec5272c6f8e237f0752938a89c633e3a1b81f5
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241441
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 10, 2020

Change https://golang.org/cl/241839 mentions this issue: internal/frontend: update legacy and new terminology

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 10, 2020

Change https://golang.org/cl/241859 mentions this issue: internal/frontend: add serveDirectoryPage

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 10, 2020

Change https://golang.org/cl/241838 mentions this issue: internal/postgres: add GetPackagesInDirectory

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 10, 2020

Change https://golang.org/cl/241898 mentions this issue: internal/postgres: move version code to separate file

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 10, 2020

Change https://golang.org/cl/241899 mentions this issue: internal: add Legacy prefix to GetPsuedoVersions and GetTaggedVersions functions

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 10, 2020

Change https://golang.org/cl/241900 mentions this issue: internal: remove New suffix from structs

gopherbot pushed a commit to golang/pkgsite that referenced this issue Jul 10, 2020
…s functions

The GetPsuedoVersions* and GetTaggedVersions* functions read from the
packages table, and will be replaced by a single GetVersionsForPath
function.

These functions are now prefixed with "Legacy" to indicate that they
will be deprecated.

For golang/go#39629

Change-Id: I7f89f9890f135b5ddb363a51a9706e48d02594b5
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/241899
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 26, 2020

Change https://golang.org/cl/265097 mentions this issue: migrations: drop modules.readme_* columns

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265242 mentions this issue: migrations: drop imports table

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265247 mentions this issue: migrations: drop idx_paths_path

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265243 mentions this issue: migrations: drop packages table

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265249 mentions this issue: internal/worker: remove imports in TestFetchAndUpdateState_NotFound

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265240 mentions this issue: migrations: add licenses_module_id_file_path

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265241 mentions this issue: internal/postgres: change ON CONFLICT key in insertLicenses

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2020

Change https://golang.org/cl/265248 mentions this issue: migrations: add search_documents.path_id column

gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
The licenses table currently has two FKs to modules.
licenses_module_path_fkey is no longer needed and is dropped.

For golang/go#39629

Change-Id: Ibf5a6400b1cf90e82748dcc53f45d23fdc29286d
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265077
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
A unique index is added to licenses(module_id, file_path).

This is temporary to enable insertLicenses, until the PK is switched.

For golang/go#39629

Change-Id: Iba9d5da93e8dffe477cada454a84248bd48a9b3c
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265240
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
insertLicenses now uses licenses (module_id, file_path) as the ON
CONFLICT key.

For golang/go#39629

Change-Id: Ib85a61f538a176c1bfff7a070121162405ef21dc
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265241
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
The primary key for licenses is changed to (module_id, file_path).

The licenses.module_path and licenses.version columns will be dropped in
a future CL.

For golang/go#39629

Change-Id: I3ae8f7c582013a897152bdf5926f6a9287136c5e
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265019
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
The columns modules.readme_file_path and modules.readme_contents are
dropped, since they are no longer being used.

For golang/go#39629

Change-Id: I3829bdeb4eea9a5b8fb4a706641714a3aa8bfefc
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265097
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
TestFetchAndUpdateState_NotFound no longer references the imports table.

For golang/go#39629

Change-Id: Icb60c0bf46656cb009c84302778ccf7f8e1fcbce
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265249
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
The imports table is no longer used and is dropped.

This frees up 136 GB of table space and 478 GB of indexes.

For golang/go#39629

Change-Id: I5a381832d9d4d69408fb62052782097436c40371
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265242
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
The packages table is no longer used and dropped.

This frees up 827 GB of table space and 78 GB of indexes.

For golang/go#39629

Change-Id: I149707013c3fd39122b55f79976893ca979982cb
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265243
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
idx_paths_path is a duplicate of index paths_path_module_id_key and is dropped.

This frees up 16 GB of index space.

For golang/go#39629

Change-Id: Ia6baa1c47a00c2724ab85274787d68befc5ac92d
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265247
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Oct 27, 2020
A search_documents.path_id column is added, which will become a FK to
the paths table.

For golang/go#39629

Change-Id: I4c24c1f93229afb758d4ca0a451fd3a4f6d01f0b
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/265248
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 28, 2020

Change https://golang.org/cl/265762 mentions this issue: migrations: add search_documents.module_id column

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 3, 2020

Change https://golang.org/cl/267437 mentions this issue: migrations: add paths table and units.path_id

gopherbot pushed a commit to golang/pkgsite that referenced this issue Nov 3, 2020
A paths table is added, which contains the path string for every path in
the units table.

This table is added to replace units.path in order to reduce the index
size on that column and improve performance.

For golang/go#39629

Change-Id: I9d173bbb8a1680176f1807d8d3561cd6906afaf3
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/267437
Trust: Julie Qiu <julie@golang.org>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 19, 2020

Change https://golang.org/cl/271746 mentions this issue: migrations: drop licenses.module_path and licenses.version

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 20, 2020

Change https://golang.org/cl/272088 mentions this issue: migrations: drop licenses.module_path and licenses.version

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 20, 2020

Change https://golang.org/cl/272087 mentions this issue: internal/postgres: stop writing to licenses.module_path and licenses.version

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 20, 2020

Change https://golang.org/cl/272086 mentions this issue: migrations: set licenses.module_path and licenses.version as NOT NULL

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 20, 2020

Change https://golang.org/cl/271747 mentions this issue: internal/postgres: use moduleID in getModuleLicenses

gopherbot pushed a commit to golang/pkgsite that referenced this issue Nov 20, 2020
getModuleLicenses now accepts moduleID instead of module path and
version as args.

In a later CL, we will be dropping licenses.module_path and
licenses.module_id.

For golang/go#39629

Change-Id: I18c525c584fd181372ef6c01241a90c19cb28a96
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/271747
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Nov 20, 2020
Drop the NOT NULL constraint on licenses.module_path and
licenses.version, so that we can stop inserting into these
rows in the next CL.

In a later CL, licenses.module_path and licenses.version will be
dropped.

For golang/go#39629

Change-Id: Ie917498306d46e03a761af32406356774a18fc78
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/272086
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
gopherbot pushed a commit to golang/pkgsite that referenced this issue Nov 20, 2020
…version

In a later CL, licenses.module_path and licenses.version will be
dropped.

For golang/go#39629

Change-Id: I7de03b66686e04128c761f477130eb98571a8cd2
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/272087
Trust: Julie Qiu <julie@golang.org>
Run-TryBot: Julie Qiu <julie@golang.org>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants