-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #5 from rcrowe-google/Kshitijaa/base/boilerplate
Added boilerplate code from Hello World example
- Loading branch information
Showing
12 changed files
with
499 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Contribution Guidelines | ||
|
||
## Directory Structure | ||
The repo contains three main directories as follows: | ||
- **[Component](./component):** Contains the main component code with a separate file for the executor code | ||
- **[Data](./data):** Containing the sample data to be used for testing | ||
- **[Example](./example):** Contains example codes to test our component with the CSVs present in [data](./data) | ||
|
||
## A few Git and GitHub practices | ||
|
||
### Commits | ||
Commits serve as checkpoints during your workflow and can be used to **revert back** in case something gets messed up. | ||
- **When to commit:** Try not to pile up many changes in multiple commits while ensuring that you don't make too many commits for fixing a small issue. | ||
- **Commit messages:** Commit messages should be descriptive enough for an external person to get an idea of what it accomplished while ensuring they don't exceed 50 characters. | ||
|
||
Check out [this](https://gist.github.com/turbo/efb8d57c145e00dc38907f9526b60f17) for more information about the good practices | ||
|
||
### Branches | ||
Branches are a good way to simulataniously work on different features at the same time. Check out [git-scm](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging) to know more about various concepts involved in the same. | ||
|
||
For descriptive branch names, it is a good idea to follow the following format: | ||
**`name/keyword/short-description`** | ||
- **Name:** Name of the person/s working on the branch. This can be ignored if many people(>2) are expected to work on it. | ||
- **Keyword:** This describes what "type" of work this branch is supposed to do. These are typically named as: | ||
- `feature`: Adding/expanding a feature | ||
- `base`: Adding boilerplate/readme/templates etc. | ||
- `bug`: Fixes a bug | ||
- `junk`: Throwaway branch created to experiment | ||
- **Short description:** As the name suggests, this contains a short description about the branch, usually no longer than 2-3 words separated by a hyphen (`-`). | ||
|
||
P.S. If multiple branches are being used to work on the same issue (say issue `#n`), they can be named as `name/keyword/#n-short-description` | ||
|
||
### Issues | ||
The following points should be considered while creating new issues | ||
- Use relevant labels like `bug`, `feature` etc. | ||
- If the team has decided the person who will work on it, it should be **assigned** to the said person as soon as possible to prevent same work being done twice. | ||
- The issue should be linked in the **project** if needed and the status of the same should be maintained as the work progresses. | ||
|
||
### Pull Requests | ||
It is always a good idea to ensure the following are present in your Pull Request description: | ||
- Relevant issue/s | ||
- What it accomplished | ||
- Mention `[WIP]` in title and make it a `Draft Pull Request` if it is a work in progress | ||
- Once the pull request is final, it should be **requested for review** from the concerned people |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#### SIG TFX-Addons | ||
# Project Proposal | ||
|
||
**Your name:** Pratishtha Abrol | ||
|
||
**Your email:** pratishthaabrol@gmail.com | ||
|
||
**Your company/organization:** Outreachy | ||
|
||
**Project name:** [Schema curation custom component](https://github.com/tensorflow/tfx-addons/issues/8) | ||
|
||
## Project Description | ||
This project applies Python user code from a user-supplied module file to a schema produced by SchemaGen, to curate the schema based on domain knowledge. | ||
|
||
## Project Category | ||
Component | ||
|
||
## Project Use-Case(s) | ||
This project will allow the user to add a custom component that modifies the schema generated by SchemaGen component according to user knowledge, for example, fixing domain limits that were inferred wrongly by the SchemaGen component. | ||
|
||
## Project Implementation | ||
Implementation of the Schema Curation Custom Component can be done using the following approach: | ||
- Get the base Schema using SchemaGen component of TFX | ||
- User supplies a module file with a fully-custom component that defines the additions/changes to the initially generated schema through SchemaGen. | ||
- And execution script would run on the module file, which sets and modifies variables accordingly. | ||
- The base schema gets modified according to the module file and used further along the pipeline | ||
|
||
## Project Dependencies | ||
The implementation will use the [TFDV library](https://www.tensorflow.org/tfx/data_validation/api_docs/python/tfdv) for validation and modification of schema objects according to the module file provided by the user. The following two methods would be of special focus: | ||
- [tfdv.set_domain](https://www.tensorflow.org/tfx/data_validation/api_docs/python/tfdv/set_domain) | ||
- [tfdv.write_schema_text](https://www.tensorflow.org/tfx/data_validation/api_docs/python/tfdv/write_schema_text) | ||
|
||
A similar implementation can be seen in the [Transform library](https://github.com/tensorflow/transform). Paricularly, the [schema_utils](https://github.com/tensorflow/transform/blob/master/tensorflow_transform/tf_metadata/schema_utils.py) method could come in useful. | ||
|
||
## Project Team | ||
**Project Leader** : Pratishtha Abrol, pratishtha-abrol, pratishthaabrol@gmail.com | ||
1. Fatimah Adwan, FatimahAdwan, akilahafaf72@gmail.com | ||
2. Kshitijaa Jaglan, deutranium, jaglan.kshitijaa2@gmail.com | ||
3. Nirzari Gupta, nirzu97, nirzu97@gmail.com |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,17 @@ | ||
# schemacomponent | ||
Outreachy TFX custom component project | ||
# Schema Curation Custom Component | ||
|
||
> Outreachy TFX custom component project | ||
This repo contains the code for Schema Curation Custom Component made as a part of [TFX-Addons](https://github.com/tensorflow/tfx-addons/) through the [Outreachy](https://www.outreachy.org/outreachy-may-2021-internship-round/communities/tensorflow/#create-custom-components-and-tools-for-tensorflow-) program. You may view the linked Pull Request in TFX-Addons [here](https://github.com/tensorflow/tfx-addons/pull/32) and the issue [here](https://github.com/tensorflow/tfx-addons/issues/8) for relevant discussions related to the project. | ||
|
||
## The Team: | ||
### Mentors: | ||
- Robert Crowe | ||
- Thea Lamkin | ||
- Josh Gordon | ||
|
||
### Interns: | ||
- [Fatima Adwan](https://github.com/FatimahAdwan/FatimahAdwan) | ||
- [Kshitijaa Jaglan](https://github.com/deutranium/) | ||
- [Nirzari Gupta](https://github.com/Nirzu97) | ||
- [Pratishtha Abrol](https://github.com/pratishtha-abrol) |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Lint as: python3 | ||
# Copyright 2019 Google LLC. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Example of a Hello World TFX custom component. | ||
This custom component simply reads tf.Examples from input and passes through as | ||
output. This is meant to serve as a kind of starting point example for creating | ||
custom components. | ||
This component along with other custom component related code will only serve as | ||
an example and will not be supported by TFX team. | ||
""" | ||
|
||
from typing import Optional, Text | ||
|
||
from tfx import types | ||
from tfx.dsl.components.base import base_component | ||
from tfx.dsl.components.base import executor_spec | ||
from tfx.examples.custom_components.hello_world.hello_component import executor | ||
from tfx.types import channel_utils | ||
from tfx.types import standard_artifacts | ||
from tfx.types.component_spec import ChannelParameter | ||
from tfx.types.component_spec import ExecutionParameter | ||
|
||
|
||
class HelloComponentSpec(types.ComponentSpec): | ||
"""ComponentSpec for Custom TFX Hello World Component.""" | ||
|
||
PARAMETERS = { | ||
# These are parameters that will be passed in the call to | ||
# create an instance of this component. | ||
'name': ExecutionParameter(type=Text), | ||
} | ||
INPUTS = { | ||
# This will be a dictionary with input artifacts, including URIs | ||
'input_data': ChannelParameter(type=standard_artifacts.Examples), | ||
} | ||
OUTPUTS = { | ||
# This will be a dictionary which this component will populate | ||
'output_data': ChannelParameter(type=standard_artifacts.Examples), | ||
} | ||
|
||
|
||
class HelloComponent(base_component.BaseComponent): | ||
"""Custom TFX Hello World Component. | ||
This custom component class consists of only a constructor. | ||
""" | ||
|
||
SPEC_CLASS = HelloComponentSpec | ||
EXECUTOR_SPEC = executor_spec.ExecutorClassSpec(executor.Executor) | ||
|
||
def __init__(self, | ||
input_data: types.Channel = None, | ||
output_data: types.Channel = None, | ||
name: Optional[Text] = None): | ||
"""Construct a HelloComponent. | ||
Args: | ||
input_data: A Channel of type `standard_artifacts.Examples`. This will | ||
often contain two splits: 'train', and 'eval'. | ||
output_data: A Channel of type `standard_artifacts.Examples`. This will | ||
usually contain the same splits as input_data. | ||
name: Optional unique name. Necessary if multiple Hello components are | ||
declared in the same pipeline. | ||
""" | ||
# output_data will contain a list of Channels for each split of the data, | ||
# by default a 'train' split and an 'eval' split. Since HelloComponent | ||
# passes the input data through to output, the splits in output_data will | ||
# be the same as the splits in input_data, which were generated by the | ||
# upstream component. | ||
if not output_data: | ||
output_data = channel_utils.as_channel([standard_artifacts.Examples()]) | ||
|
||
spec = HelloComponentSpec(input_data=input_data, | ||
output_data=output_data, name=name) | ||
super(HelloComponent, self).__init__(spec=spec) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Lint as: python3 | ||
# Copyright 2019 Google LLC. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Tests for HelloComponent.""" | ||
|
||
import json | ||
|
||
import tensorflow as tf | ||
|
||
from tfx.examples.custom_components.hello_world.hello_component import component | ||
from tfx.types import artifact | ||
from tfx.types import channel_utils | ||
from tfx.types import standard_artifacts | ||
|
||
|
||
class HelloComponentTest(tf.test.TestCase): | ||
|
||
def setUp(self): | ||
super(HelloComponentTest, self).setUp() | ||
self.name = 'HelloWorld' | ||
|
||
def testConstruct(self): | ||
input_data = standard_artifacts.Examples() | ||
input_data.split_names = json.dumps(artifact.DEFAULT_EXAMPLE_SPLITS) | ||
output_data = standard_artifacts.Examples() | ||
output_data.split_names = json.dumps(artifact.DEFAULT_EXAMPLE_SPLITS) | ||
this_component = component.HelloComponent( | ||
input_data=channel_utils.as_channel([input_data]), | ||
output_data=channel_utils.as_channel([output_data]), | ||
name=u'Testing123') | ||
self.assertEqual(standard_artifacts.Examples.TYPE_NAME, | ||
this_component.outputs['output_data'].type_name) | ||
artifact_collection = this_component.outputs['output_data'].get() | ||
for artifacts in artifact_collection: | ||
split_list = json.loads(artifacts.split_names) | ||
self.assertEqual(artifact.DEFAULT_EXAMPLE_SPLITS.sort(), | ||
split_list.sort()) | ||
|
||
|
||
if __name__ == '__main__': | ||
tf.test.main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Lint as: python3 | ||
# Copyright 2019 Google LLC. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Example of a Hello World TFX custom component. | ||
This custom component simply passes examples through. This is meant to serve as | ||
a kind of starting point example for creating custom components. | ||
This component along with other custom component related code will only serve as | ||
an example and will not be supported by TFX team. | ||
""" | ||
|
||
import json | ||
import os | ||
from typing import Any, Dict, List, Text | ||
|
||
|
||
from tfx import types | ||
from tfx.dsl.components.base import base_executor | ||
from tfx.dsl.io import fileio | ||
from tfx.types import artifact_utils | ||
from tfx.utils import io_utils | ||
|
||
|
||
class Executor(base_executor.BaseExecutor): | ||
"""Executor for HelloComponent.""" | ||
|
||
def Do(self, input_dict: Dict[Text, List[types.Artifact]], | ||
output_dict: Dict[Text, List[types.Artifact]], | ||
exec_properties: Dict[Text, Any]) -> None: | ||
"""Copy the input_data to the output_data. | ||
For this example that is all that the Executor does. For a different | ||
custom component, this is where the real functionality of the component | ||
would be included. | ||
This component both reads and writes Examples, but a different component | ||
might read and write artifacts of other types. | ||
Args: | ||
input_dict: Input dict from input key to a list of artifacts, including: | ||
- input_data: A list of type `standard_artifacts.Examples` which will | ||
often contain two splits, 'train' and 'eval'. | ||
output_dict: Output dict from key to a list of artifacts, including: | ||
- output_data: A list of type `standard_artifacts.Examples` which will | ||
usually contain the same splits as input_data. | ||
exec_properties: A dict of execution properties, including: | ||
- name: Optional unique name. Necessary iff multiple Hello components | ||
are declared in the same pipeline. | ||
Returns: | ||
None | ||
Raises: | ||
OSError and its subclasses | ||
""" | ||
self._log_startup(input_dict, output_dict, exec_properties) | ||
|
||
input_artifact = artifact_utils.get_single_instance( | ||
input_dict['input_data']) | ||
output_artifact = artifact_utils.get_single_instance( | ||
output_dict['output_data']) | ||
output_artifact.split_names = input_artifact.split_names | ||
|
||
split_to_instance = {} | ||
|
||
for split in json.loads(input_artifact.split_names): | ||
uri = artifact_utils.get_split_uri([input_artifact], split) | ||
split_to_instance[split] = uri | ||
|
||
for split, instance in split_to_instance.items(): | ||
input_dir = instance | ||
output_dir = artifact_utils.get_split_uri([output_artifact], split) | ||
for filename in fileio.listdir(input_dir): | ||
input_uri = os.path.join(input_dir, filename) | ||
output_uri = os.path.join(output_dir, filename) | ||
io_utils.copy_file(src=input_uri, dst=output_uri, overwrite=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pickup_community_area,fare,trip_start_month,trip_start_hour,trip_start_day,trip_start_timestamp,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,trip_miles,pickup_census_tract,dropoff_census_tract,payment_type,company,trip_seconds,dropoff_community_area,tips | ||
60,27.05,10,2,3,1380593700,41.836150155,-87.648787952,,,12.6,,,Cash,Taxi Affiliation Services,1380,,0.0 | ||
10,5.85,10,1,2,1382319000,41.985015101,-87.804532006,,,0.0,,,Cash,Taxi Affiliation Services,180,,0.0 | ||
14,16.65,5,7,5,1369897200,41.968069,-87.721559063,,,0.0,,,Cash,Dispatch Taxi Affiliation,1080,,0.0 | ||
13,16.45,11,12,3,1446554700,41.983636307,-87.723583185,,,6.9,,,Cash,,780,,0.0 |
Empty file.
Oops, something went wrong.