docs: initial POC of Jessica's fabric doc generator #2023

mhamilton723 · 2023-07-20T01:09:45Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Briefly describe the changes included in this Pull Request.

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

No. You can skip this section.
Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

No. You can skip this section.
Yes. Make sure you have added samples following below steps.

Find the corresponding markdown file for your new feature in website/docs/documentation folder.
Make sure you choose the correct class estimators/transformers and namespace.
Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
Make sure the DocTable points to correct API link.
Navigate to website folder, and run yarn run start to make sure the website renders correctly.
Don't forget to add  before each python code blocks to enable auto-tests for python samples.
Make sure the WebsiteSamplesTests job pass in the pipeline.

github-actions · 2023-07-20T01:09:58Z

Hey @mhamilton723 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

update fabric channel

JessicaXYWang · 2023-08-01T07:34:10Z

/azp run

azure-pipelines · 2023-08-01T07:34:19Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2023-08-01T08:07:20Z

Codecov Report

Merging #2023 (b68d7de) into master (cde6834) will decrease coverage by 2.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2023      +/-   ##
==========================================
- Coverage   87.07%   85.05%   -2.03%     
==========================================
  Files         306      306              
  Lines       16063    16063              
  Branches      852      852              
==========================================
- Hits        13987    13662     -325     
- Misses       2076     2401     +325

see 28 files with indirect coverage changes

…23/SynapseML into rebase-fabric-channel

JessicaXYWang · 2023-08-01T08:10:04Z

/azp run

azure-pipelines · 2023-08-01T08:10:14Z

Azure Pipelines successfully started running 1 pipeline(s).

docs/Explore Algorithms/Responsible AI/Foo.ipynb

tools/docgen/README.md

tools/docgen/docgen/channels.py

mhamilton723 · 2023-08-01T10:12:25Z

tools/docgen/docgen/channels.py

+            html = markdown.markdown(md, extensions=["markdown.extensions.tables", "markdown.extensions.fenced_code"])
+            parsed_html = BeautifulSoup(html)
+            # Download images and place them in media directory while updating their links
+            parsed_html = self._download_and_replace_images(parsed_html, resources, output_img_dir, os.path.dirname(output_file), None, False)


Looks like this is in both branches of if state ent

They are slightly different. The only line that's the same is parsed_html = BeautifulSoup(html) is this OK?

the extra formatting steps seem like they would be no-ops on the first branch. We also want to remove useless style and cell output metadata in both branches. In this case they can be safely combined right?

Sure. Extra style and output metadata are not likely to be in a rst file, but it won't hurt to combine.

mhamilton723 · 2023-08-01T10:13:24Z

tools/docgen/docgen/channels.py

+            # Download images and place them in media directory while updating their links
+            parsed_html = self._download_and_replace_images(parsed_html, resources, output_img_dir, os.path.dirname(output_file), None, False)
+            # Remove StatementMeta
+            for element in parsed_html.find_all(text=re.compile("StatementMeta\(.*?Available\)")):


These two can be put in both branches of it statement right, also statement meta check should be pushed upstream to the actual notebooks when possible because we don’t want them there either

This is for Sempy doc. We do not have that in our notebooks and they have that line in some of their example. Just want to put it there to have a cleaner output for their samples.

yes makes sense, but this would be a no-op for us (And would be helpful if our docs had those by mistake)

Adding a warning... do you want to it be an error?

tools/docgen/docgen/manifest.yaml

tools/docgen/setup.py

tools/docgen/docgen/channels.py

tools/docgen/docgen/manifest.yaml

github-actions

Summary by GPT-4

The changes in this commit include:

Adding new dependencies to environment.yml for mistletoe, pypandoc, markdownify, and traitlets.
Creating a new README.md file for the doc generating pipeline onboarding for the Fabric channel.
Updating the channels.py file to include a new class called FabricChannel with various methods for processing input files, downloading and replacing images, validating metadata, generating metadata headers, reading RST files, converting to markdown links, and more.
Modifying the core.py file to update the process method with an index parameter.
Updating the manifest.yaml file with new metadata for various notebooks related to FabricChannel.
Updating the setup.py file to include new dependencies like pypandoc, markdownify, and traitlets.

These changes are mainly focused on adding support for a new Fabric channel in the doc generating pipeline and updating related files with necessary modifications and dependencies.

Suggestions

No suggestions are needed as the changes in this PR seem to be well implemented and organized.

JessicaXYWang · 2023-08-03T06:15:38Z

/azp run

azure-pipelines · 2023-08-03T06:15:47Z

Azure Pipelines successfully started running 1 pipeline(s).

…23/SynapseML into rebase-fabric-channel

JessicaXYWang · 2023-08-03T07:11:12Z

/azp run

azure-pipelines · 2023-08-03T07:11:22Z

Azure Pipelines successfully started running 1 pipeline(s).

JessicaXYWang · 2023-08-03T07:38:11Z

/azp run

azure-pipelines · 2023-08-03T07:38:22Z

Azure Pipelines successfully started running 1 pipeline(s).

JessicaXYWang · 2023-08-03T08:19:05Z

/azp run

azure-pipelines · 2023-08-03T08:19:15Z

Azure Pipelines successfully started running 1 pipeline(s).

tools/docgen/docgen/core.py

mhamilton723 · 2023-08-04T01:10:04Z

/azp run

azure-pipelines · 2023-08-04T01:10:14Z

Azure Pipelines successfully started running 1 pipeline(s).

tools/docgen/docgen/core.py

mhamilton723 · 2023-08-04T01:24:18Z

/azp run

azure-pipelines · 2023-08-04T01:24:27Z

Azure Pipelines successfully started running 1 pipeline(s).

JessicaXYWang · 2023-08-04T06:29:48Z

/azp run

azure-pipelines · 2023-08-04T06:29:58Z

Azure Pipelines successfully started running 1 pipeline(s).

JessicaXYWang · 2023-08-04T09:14:30Z

/azp run

azure-pipelines · 2023-08-04T09:14:39Z

Azure Pipelines successfully started running 1 pipeline(s).

* docs: initial POC of Jessica's fabric doc generator * update fabric channel * update fabric channel - rst file * update fabric channel * update fabric channel * add readme, resolve conflict * add install requires * update fabric channel * format channel * add back WebsiteChannel * formatting docgen * Update tools/docgen/docgen/core.py * Update tools/docgen/docgen/core.py * fix index issue * raise warning for if statementmeta in notebookcell output --------- Co-authored-by: Jessica Wang <jessiwang@microsoft.com> Co-authored-by: JessicaXYWang <108437381+JessicaXYWang@users.noreply.github.com>

docs: initial POC of Jessica's fabric doc generator

acc44c3

JessicaXYWang and others added 7 commits July 24, 2023 08:01

update fabric channel

7f8ac64

update fabric channel - rst file

380016a

update fabric channel

388aaf4

Merge pull request #10 from JessicaXYWang/rebase-fabric-channel-jessica

fc8873b

update fabric channel

update fabric channel

8bf5db9

add readme, resolve conflict

6b9d35f

Merge branch 'master' into rebase-fabric-channel

1242b28

JessicaXYWang added 2 commits August 1, 2023 01:08

add install requires

959ad2f

Merge branch 'rebase-fabric-channel' of https://github.com/mhamilton7…

f7538ed

…23/SynapseML into rebase-fabric-channel

mhamilton723 commented Aug 1, 2023

View reviewed changes

tools/docgen/docgen/manifest.yaml Outdated Show resolved Hide resolved

JessicaXYWang and others added 2 commits August 2, 2023 23:09

update fabric channel

16d75d2

Merge branch 'master' into rebase-fabric-channel

f06c3a2

github-actions bot reviewed Aug 3, 2023

View reviewed changes

JessicaXYWang added 2 commits August 3, 2023 00:03

format channel

33ffd77

Merge branch 'rebase-fabric-channel' of https://github.com/mhamilton7…

795969b

…23/SynapseML into rebase-fabric-channel

add back WebsiteChannel

912dfdf

formatting docgen

c298800

mhamilton723 commented Aug 4, 2023

View reviewed changes

tools/docgen/docgen/core.py Outdated Show resolved Hide resolved

Update tools/docgen/docgen/core.py

8d983c2

mhamilton723 commented Aug 4, 2023

View reviewed changes

tools/docgen/docgen/core.py Outdated Show resolved Hide resolved

Update tools/docgen/docgen/core.py

b4829c1

fix index issue

7f176e5

raise warning for if statementmeta in notebookcell output

b68d7de

mhamilton723 merged commit 9eff35f into microsoft:master Aug 4, 2023
67 of 68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: initial POC of Jessica's fabric doc generator #2023

docs: initial POC of Jessica's fabric doc generator #2023

mhamilton723 commented Jul 20, 2023

github-actions bot commented Jul 20, 2023

JessicaXYWang commented Aug 1, 2023

azure-pipelines bot commented Aug 1, 2023

codecov-commenter commented Aug 1, 2023 •

edited

JessicaXYWang commented Aug 1, 2023

azure-pipelines bot commented Aug 1, 2023

mhamilton723 Aug 1, 2023

JessicaXYWang Aug 3, 2023

mhamilton723 Aug 3, 2023

JessicaXYWang Aug 4, 2023

mhamilton723 Aug 1, 2023

JessicaXYWang Aug 3, 2023

mhamilton723 Aug 3, 2023

JessicaXYWang Aug 4, 2023

github-actions bot left a comment •

edited

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

mhamilton723 commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

mhamilton723 commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

JessicaXYWang commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

JessicaXYWang commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

docs: initial POC of Jessica's fabric doc generator #2023

docs: initial POC of Jessica's fabric doc generator #2023

Conversation

mhamilton723 commented Jul 20, 2023

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change any dependencies?

Does this PR add a new feature? If so, have you added samples on website?

github-actions bot commented Jul 20, 2023

JessicaXYWang commented Aug 1, 2023

azure-pipelines bot commented Aug 1, 2023

codecov-commenter commented Aug 1, 2023 • edited

Codecov Report

JessicaXYWang commented Aug 1, 2023

azure-pipelines bot commented Aug 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot left a comment • edited

Choose a reason for hiding this comment

Summary by GPT-4

Suggestions

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

JessicaXYWang commented Aug 3, 2023

azure-pipelines bot commented Aug 3, 2023

mhamilton723 commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

mhamilton723 commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

JessicaXYWang commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

JessicaXYWang commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

codecov-commenter commented Aug 1, 2023 •

edited

github-actions bot left a comment •

edited