Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type Hinting and Type Aliasing Draft PR #964

Closed
wants to merge 3 commits into from
Closed

Conversation

ABostrom
Copy link
Contributor

@ABostrom ABostrom commented Jun 15, 2021

Reference Issues/PRs

This is a draft PR for type hinting in SKtime.

It's super WIP, but I wanted to start work on it. so others could see my ideas, and comment as I muddle my way through.

I needed a long summer project I could work on slowly.

What should a reviewer concentrate their feedback on?

Typing the data containers is tricky, as we have some different formats.

Everything else should be straightforward.

@ABostrom ABostrom requested a review from mloning June 15, 2021 14:09
@ABostrom ABostrom changed the title type the base abcs for sktime. Type Hinting and Type Aliasing Draft PR Jun 18, 2021
@ABostrom
Copy link
Contributor Author

ABostrom commented Jun 18, 2021

If people want to contribute and help:

Things I'd like others to do some research on. Typing / Aliasing nested pandas data frames. Typing numpy arrays.
https://numpy.org/devdocs/reference/typing.html

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 18, 2021

So, my big question is: what is a "good" way to do runtime type checks or hints, using "implicit type definitions" such as "a pandas.DataFrame with exactly two columns named Alice and Bob with no more than 100 rows"?

Copy link
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ABostrom looks good to me. Just two minor comments.

Is there any way to check if these type hints are consistent with the code in the repo? We could think about adding that to the CI.

We also need to make sure to replace the type hints here to avoid having duplicate definitions of types.

@@ -0,0 +1,32 @@
from typing import Union, Tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this fits better into sktime/base/ rather than sktime/utils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. happy to move it. Just picked somewhere for the time being.

@@ -1,11 +1,13 @@
#!/usr/bin/env python3 -u
# -*- coding: utf-8 -*-
# copyright: sktime developers, BSD-3-Clause License (see LICENSE file)

from __future__ import annotations # this is for type hinting
from typing import Dict
__author__ = ["Markus Löning"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move imports above the __author__ and __all__ info?

@mloning mloning added the documentation Documentation & tutorials label Jun 18, 2021
@ABostrom
Copy link
Contributor Author

So, my big question is: what is a "good" way to do runtime type checks or hints, using "implicit type definitions" such as "a pandas.DataFrame with exactly two columns named Alice and Bob with no more than 100 rows"?

So i think this is a pretty open question within python world.

@chrisholder
Copy link
Collaborator

Thought Id jump in and just add my take on this. So first off just some links to typing packages I've used in other projects and are extremely useful we may want to consider (I've linked these to @ABostrom in the past) :
https://pypi.org/project/nptyping/
https://pypi.org/project/data-science-types/

So just quickly how I structure my types for projects is:

  • Have a file in the base directory called base_types.py which is intended to contain 'shared types'. These are utility esc types for things non project specific like numpy types, dataframe types etc.
  • For types specific to a class or function I declare them in the same file they exist in.

My motivation for doing it this way doesn't come from Python. As @ABostrom said above it's quite an open conversation currently in the Python community, so I decided to structure my types like I would as if I was using Typescript. Typescript is similar in the sense it's built on top of a language that doesn't have types (javascript) . Typescript is just a wrapper around your javascript which allows type checking to be done at run time. Once checks are complete the wrapper is removed and plain javascript is outputted. This is essentially doing the same thing python typing is doing (although Typescript is more established and more powerful for typing stuff).

While I certainly don't think this is the final end solution potentially drawing some of how Typescript does types could be of benefit just because it is so well established and used.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 19, 2021

maybe also useful to point out this old enhancement proposal of mine:
https://github.com/sktime/enhancement-proposals/blob/main/steps/05_scitype_based_IO_checks/step.md

look at the "type checking" at the bottom, this could be its own python package.

Regarding the main content, I think we're moving towards the nested architecture (see foreceasting base class) instead of the decorator architecture discussed in the STEP.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 19, 2021

@ABostrom, @chrisholder, do you know anything good for declarative typing?

E.g., I want to assign a type "mytype" to a variable x, so I can do sth like first assign_custom_type(x, "mytype"), then query custom_type_of(x) which should return "mytype"? And/or the ability to define your own types and conduct inference on them.

As opposed to inferring type based on some pre-defined or pre-existing type system.

@chrisholder
Copy link
Collaborator

@ABostrom, @chrisholder, do you know anything good for declarative typing?

E.g., I want to assign a type "mytype" to a variable x, so I can do sth like first assign_custom_type(x, "mytype"), then query custom_type_of(x) which should return "mytype"? And/or the ability to define your own types and conduct inference on them.

As opposed to inferring type based on some pre-defined or pre-existing type system.

Not 100% sure what you mean but did this example of what I thought you mean. You can't guarentee that someone will assign the correct value but you can definitely add additional checks.

image

I also saw this in the python documentation:

image

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 19, 2021

@chrisholder, see #980 and the linked extension proposal no.5 therein - hope that makes it clearer

@ABostrom
Copy link
Contributor Author

https://pypi.org/project/data-science-types/

This has been depcreated and abandoned, based on pandas and numpy pushing this internally.

@ABostrom
Copy link
Contributor Author

The only dependency I would like to propose we add is : https://github.com/python/mypy

Mypy can be sued to staticly check the type hints we add to our programs, and I think can be a big push in the direction of code correctness validation.

@mloning
Copy link
Contributor

mloning commented Jun 20, 2021

Mypy can be sued to staticly check the type hints we add to our programs, and I think can be a big push in the direction of code correctness validation.

That sounds like a good idea, having this run in the CI pipelines would be important to ensure that that code/type hints make sense

@mloning mloning closed this Jun 20, 2021
@mloning mloning reopened this Jun 20, 2021
@mloning
Copy link
Contributor

mloning commented Jun 20, 2021

Sorry, accidentally closed the PR, reopened now.

@TonyBagnall
Copy link
Contributor

hi @ABostrom , there is conflict with main now, can we close this PR and reopen it from a branch? That way I can tidy it up

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 11, 2021

should we perhaps complete the datatypes module first, which brings a number of related things under one umbrella?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 27, 2021

Alright, the datatypes PR is finally merged! #1225

I would suggest: let's move the _typing into the datatypes module and ensure we are consistent with mtype/scitype names, see SCITYPE_REGISTER in the _registry file.

I think Panel and Series already are consistent with your choice of type aliases (on which I based the scitype names), so no change needed as far as I can see.

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 5, 2021

@ABostrom, ping

@mloning
Copy link
Contributor

mloning commented Dec 4, 2021

I'll close this for now, feel free to re-open if you want to pick this up again, I've also started playing around with it here #1679

@mloning mloning closed this Dec 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation & tutorials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants