Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: initial POC of Jessica's fabric doc generator #2023

Merged
merged 20 commits into from
Aug 4, 2023

Conversation

mhamilton723
Copy link
Collaborator

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Briefly describe the changes included in this Pull Request.

How is this patch tested?

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes. Make sure you have added samples following below steps.
  1. Find the corresponding markdown file for your new feature in website/docs/documentation folder.
    Make sure you choose the correct class estimators/transformers and namespace.
  2. Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
  3. Make sure the DocTable points to correct API link.
  4. Navigate to website folder, and run yarn run start to make sure the website renders correctly.
  5. Don't forget to add <!--pytest-codeblocks:cont--> before each python code blocks to enable auto-tests for python samples.
  6. Make sure the WebsiteSamplesTests job pass in the pipeline.

@github-actions
Copy link

Hey @mhamilton723 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2023

Codecov Report

Merging #2023 (b68d7de) into master (cde6834) will decrease coverage by 2.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2023      +/-   ##
==========================================
- Coverage   87.07%   85.05%   -2.03%     
==========================================
  Files         306      306              
  Lines       16063    16063              
  Branches      852      852              
==========================================
- Hits        13987    13662     -325     
- Misses       2076     2401     +325     

see 28 files with indirect coverage changes

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

docs/Explore Algorithms/Responsible AI/Foo.ipynb Outdated Show resolved Hide resolved
tools/docgen/README.md Outdated Show resolved Hide resolved
tools/docgen/README.md Outdated Show resolved Hide resolved
tools/docgen/docgen/channels.py Outdated Show resolved Hide resolved
html = markdown.markdown(md, extensions=["markdown.extensions.tables", "markdown.extensions.fenced_code"])
parsed_html = BeautifulSoup(html)
# Download images and place them in media directory while updating their links
parsed_html = self._download_and_replace_images(parsed_html, resources, output_img_dir, os.path.dirname(output_file), None, False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is in both branches of if state ent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are slightly different. The only line that's the same is parsed_html = BeautifulSoup(html) is this OK?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extra formatting steps seem like they would be no-ops on the first branch. We also want to remove useless style and cell output metadata in both branches. In this case they can be safely combined right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Extra style and output metadata are not likely to be in a rst file, but it won't hurt to combine.

# Download images and place them in media directory while updating their links
parsed_html = self._download_and_replace_images(parsed_html, resources, output_img_dir, os.path.dirname(output_file), None, False)
# Remove StatementMeta
for element in parsed_html.find_all(text=re.compile("StatementMeta\(.*?Available\)")):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two can be put in both branches of it statement right, also statement meta check should be pushed upstream to the actual notebooks when possible because we don’t want them there either

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Sempy doc. We do not have that in our notebooks and they have that line in some of their example. Just want to put it there to have a cleaner output for their samples.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes makes sense, but this would be a no-op for us (And would be helpful if our docs had those by mistake)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a warning... do you want to it be an error?

tools/docgen/docgen/manifest.yaml Outdated Show resolved Hide resolved
tools/docgen/docgen/manifest.yaml Show resolved Hide resolved
tools/docgen/setup.py Outdated Show resolved Hide resolved
tools/docgen/docgen/channels.py Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary by GPT-4

The changes in this commit include:

  1. Adding new dependencies to environment.yml for mistletoe, pypandoc, markdownify, and traitlets.
  2. Creating a new README.md file for the doc generating pipeline onboarding for the Fabric channel.
  3. Updating the channels.py file to include a new class called FabricChannel with various methods for processing input files, downloading and replacing images, validating metadata, generating metadata headers, reading RST files, converting to markdown links, and more.
  4. Modifying the core.py file to update the process method with an index parameter.
  5. Updating the manifest.yaml file with new metadata for various notebooks related to FabricChannel.
  6. Updating the setup.py file to include new dependencies like pypandoc, markdownify, and traitlets.

These changes are mainly focused on adding support for a new Fabric channel in the doc generating pipeline and updating related files with necessary modifications and dependencies.

Suggestions

No suggestions are needed as the changes in this PR seem to be well implemented and organized.

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JessicaXYWang
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 9eff35f into microsoft:master Aug 4, 2023
67 of 68 checks passed
JessicaXYWang added a commit to JessicaXYWang/SynapseML that referenced this pull request Sep 14, 2023
* docs: initial POC of Jessica's fabric doc generator

* update fabric channel

* update fabric channel - rst file

* update fabric channel

* update fabric channel

* add readme, resolve conflict

* add install requires

* update fabric channel

* format channel

* add back WebsiteChannel

* formatting docgen

* Update tools/docgen/docgen/core.py

* Update tools/docgen/docgen/core.py

* fix index issue

* raise warning for if statementmeta in notebookcell output

---------

Co-authored-by: Jessica Wang <jessiwang@microsoft.com>
Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants