Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: scratch: Create Linear Regression model #59

Closed
pdxjohnny opened this issue May 8, 2019 · 5 comments
Closed

model: scratch: Create Linear Regression model #59

pdxjohnny opened this issue May 8, 2019 · 5 comments
Labels
enhancement New feature or request

Comments

@pdxjohnny
Copy link
Member

Assignee: @yashlamba

./scripts/create.sh model scratch

Then fill out model with your implementation of linear regression. (Maybe linear.py)

@pdxjohnny pdxjohnny added this to To do in Beta Release - 0.5.0 via automation May 8, 2019
@pdxjohnny pdxjohnny added the enhancement New feature or request label May 8, 2019
@yashlamba
Copy link
Contributor

async def features(self, features: Features):
'''
Converts repos into training data
'''
cols: Dict[str, Any] = {}
for feature in features:
col = self.feature_feature_column(feature)
if not col is None:
cols[feature.NAME] = col
return cols
def feature_feature_column(self, feature: Feature):
'''
Creates a feature column for a feature
'''
dtype = feature.dtype()
if not inspect.isclass(dtype):
LOGGER.warning('Unknown dtype %r. Cound not create column' % (dtype))
return None
if dtype is int or issubclass(dtype, int) \
or dtype is float or issubclass(dtype, float):
return tensorflow.feature_column.numeric_column(feature.NAME,
shape=feature.length())
LOGGER.warning('Unknown dtype %r. Cound not create column' % (dtype))
return None
def model_dir_path(self, features: Features):
'''
Creates the path to the model dir by using the provided model dir and
the sha384 hash of the concatenated feature names.
'''
if self.parent.config.directory is None:
return None
model = hashlib.sha384(''.join(features.names()).encode('utf-8'))\
.hexdigest()
if not os.path.isdir(self.parent.config.directory):
raise NotADirectoryError('%s is not a directory' % (self.parent.config.directory))
return os.path.join(self.parent.config.directory, model)

@pdxjohnny I was implementing applicable features and found it leaded to the following functions. Do they need to be re-implemented or can I pick them up from dnn and change the conditions (for starters the feature length should be 2 and so on)

@yashlamba
Copy link
Contributor

And can you suggest me how to debug my code like how can I go about testing (for now checking that whether I have received data successfully or not)?

@pdxjohnny
Copy link
Member Author

Here's an outline for 2.a.iv and 2.a.v from https://docs.google.com/document/d/16u9Tev3O0CcUDe2nfikHmrO3Xnd4ASJ45myFgQLpvzM/edit#heading=h.s3lkoesyhz9v

Is this what you're talking about?

class SimpleLinearRegression(Model):
    async def applicable_features(self, features):
        if len(features) != 1:
            raise ValueError("simple LR only supports a single feature")
        if features[0].dtype() != int and features[0].dtype() != float:
            raise ValueError("simple LR only supports int or float feature")
        if features[0].length() != 1:
            raise ValueError("simple LR only supports single values (non-matrix / array)")
        features_we_care_about = [features[0].NAME]
        return features_we_care_about

    async def train(self, sources, features):
        features_we_care_about = self.applicable_features(features)
        async for repo in sources.with_features(features_we_care_about):
            # Grab a subset of the feature data being stored within the repo
            # The subset is the feature_we_care_about and the feature we are want to predict
            feature_data = repo.features(features_we_care_about + [self.parent.config.predict])
            xData.append(feature_data[features_we_care_about[0]])
            yData.append(feature_data[self.parent.config.predict])

@yashlamba
Copy link
Contributor

Here's an outline for 2.a.iv and 2.a.v from https://docs.google.com/document/d/16u9Tev3O0CcUDe2nfikHmrO3Xnd4ASJ45myFgQLpvzM/edit#heading=h.s3lkoesyhz9v

Is this what you're talking about?

class SimpleLinearRegression(Model):
    async def applicable_features(self, features):
        if len(features) != 1:
            raise ValueError("simple LR only supports a single feature")
        if features[0].dtype() != int and features[0].dtype() != float:
            raise ValueError("simple LR only supports int or float feature")
        if features[0].length() != 1:
            raise ValueError("simple LR only supports single values (non-matrix / array)")
        features_we_care_about = [features[0].NAME]
        return features_we_care_about

    async def train(self, sources, features):
        features_we_care_about = self.applicable_features(features)
        async for repo in sources.with_features(features_we_care_about):
            # Grab a subset of the feature data being stored within the repo
            # The subset is the feature_we_care_about and the feature we are want to predict
            feature_data = repo.features(features_we_care_about + [self.parent.config.predict])
            xData.append(feature_data[features_we_care_about[0]])
            yData.append(feature_data[self.parent.config.predict])

well, this is pretty simple and understandable, I was using dnn as my reference wherein applicable features lead to self.features which in turn lead to feature_feature_column (which ultimately checked the types).

@pdxjohnny
Copy link
Member Author

ya dnn is complicated by tensorflow APIs :)

@pdxjohnny pdxjohnny added this to the 0.5.0 Beta Release milestone Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

2 participants