Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update links to new docs domain #2400

Merged
merged 6 commits into from Mar 10, 2023
Merged

Update links to new docs domain #2400

merged 6 commits into from Mar 10, 2023

Conversation

stichbury
Copy link
Contributor

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Update links to kedro.readthedocs.io in files such as README.md to point to our new docs.kedro.org subdomain.

I've yet to update any links in the Kedro python files (not sure whether to do this?) particularly to the big chunk of files in API docs for datasets.

I've also not yet looked at what should change in the /docs settings files themselves yet.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
@stichbury stichbury self-assigned this Mar 8, 2023
@stichbury stichbury added the Component: Documentation 📄 Issue/PR for markdown and API documentation label Mar 8, 2023
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
@stichbury stichbury marked this pull request as ready for review March 8, 2023 10:02
@astrojuanlu
Copy link
Member

I've yet to update any links in the Kedro python files (not sure whether to do this?) particularly to the big chunk of files in API docs for datasets.

Which files are you referring to? Do you mean the autogenerated files, or something else?

@stichbury
Copy link
Contributor Author

stichbury commented Mar 8, 2023

I've yet to update any links in the Kedro python files (not sure whether to do this?) particularly to the big chunk of files in API docs for datasets.

Which files are you referring to? Do you mean the autogenerated files, or something else?

If you grep for kedro.readthedocs.io in our repo, you'll find loads of python files under /kedro/extras/datasets that have references to docs pages. For example:

class GBQTableDataSet(AbstractDataSet[None, pd.DataFrame]):
    """``GBQTableDataSet`` loads and saves data from/to Google BigQuery.
    It uses pandas-gbq to read and write from/to BigQuery table.

    Example usage for the
    `YAML API <https://kedro.readthedocs.io/en/stable/data/\
    data_catalog.html#use-the-data-catalog-with-the-yaml-api>`_:

    .. code-block:: yaml
...

I myself feel that all those examples that are in the markdown docs, and thus referenced by the API docs, should move back into the API docs, to reduce the narrative docs pages size and break that dependency. But that's definitely not a task for this PR, and so I'm kind of dragging on whether to update them!


There's also quite a lot of Kedro code that references docs in error messages. Again, I think we need to update the links but also think about whether there's a better approach here, since any update to the docs is going to break them without careful redirecting (some are linking to anchors, so a simple subheading revision will cause a problem).

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check all the links, but made 1 replacement suggestion for one that still contained numbers in the URL (ugh) as well as other comments. Thanks @stichbury!

CONTRIBUTING.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
docs/source/hooks/common_use_cases.md Show resolved Hide resolved
@astrojuanlu
Copy link
Member

I myself feel that all those examples that are in the markdown docs, and thus referenced by the API docs, should move back into the API docs, to reduce the narrative docs pages size and break that dependency. But that's definitely not a task for this PR, and so I'm kind of dragging on whether to update them!

I think it's good to treat those separately yes. When we make progress on #2072 we can decide a better way to create those cross references that doesn't involve hardcoding the URL (I know that Sphinx allows for that, and possibly MkDocs does too), or decide to break the dependency as you suggest.

Co-authored-by: Juan Luis Cano Rodríguez <juan_cano@mckinsey.com>
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 💯

@noklam
Copy link
Contributor

noklam commented Mar 10, 2023

I added the flag to disable linting error. See the report from CI here https://app.circleci.com/pipelines/github/kedro-org/kedro/19398/workflows/1a18e7aa-ac5b-4396-ac14-3486935d91ae/jobs/234128

Test results:
>> Issue: [B202:tarfile_unsafe_members] tarfile.extractall used without any validation. Please check and discard dangerous members.
   Severity: High   Confidence: High
   CWE: CWE-22 (https://cwe.mitre.org/data/definitions/22.html)
   More Info: https://bandit.readthedocs.io/en/1.7.5/plugins/b202_tarfile_unsafe_members.html
   Location: kedro/framework/cli/micropkg.py:314:4
313	            raise Exception("Failed to safely extract tar file.")
314	    tar.extractall(path)
315

This PR previously already add a check for files extract beyond its directory, however, bandit cannot catch this pattern. See this

Alternatively, we can further modify the safe_extract method so it follows the pattern the bandit recognize, i.e. tar.extract_all(destination, members=only_safe_members(tar). I don't think this is necessary but I want to see what @merelcht @ankatiyar thinks as you reviewed the original security PR.

@ankatiyar
Copy link
Contributor

Alternatively, we can further modify the safe_extract method so it follows the pattern the bandit recognize, i.e. tar.extract_all(destination, members=only_safe_members(tar). I don't think this is necessary but I want to see what @merelcht @ankatiyar thinks as you reviewed the original security PR.

I think it's okay to just disable the error since we it's the only place we use tarfile.extractall() and we already patched this security issue in #2180

@merelcht
Copy link
Member

Yes I agree with @ankatiyar that we've already patched this safely.

@stichbury stichbury merged commit 884f522 into main Mar 10, 2023
3 checks passed
@stichbury stichbury deleted the update-docs-links-in-md branch March 10, 2023 17:26
dannyrfar pushed a commit to dannyrfar/kedro that referenced this pull request Mar 21, 2023
* Update links to new docs domain

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update to conf.py

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update CONTRIBUTING.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_cano@mckinsey.com>

* Disable as this is false alarm

Signed-off-by: Nok <nok_lam_chan@mckinsey.com>

---------

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_cano@mckinsey.com>
Co-authored-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Danny Farah <danny.farah@quantumblack.com>
dannyrfar pushed a commit to dannyrfar/kedro that referenced this pull request Mar 22, 2023
* Update links to new docs domain

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update to conf.py

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update CONTRIBUTING.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_cano@mckinsey.com>

* Disable as this is false alarm

Signed-off-by: Nok <nok_lam_chan@mckinsey.com>

---------

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_cano@mckinsey.com>
Co-authored-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Danny Farah <danny.farah@quantumblack.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants