Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add support for IP Address and MAC Address data #18767
Hi all, this is a proposal to add a new block and type for representing IP Addresses.
Here's a notebook demonstrating the basics: http://nbviewer.jupyter.org/gist/TomAugspurger/3ba2bc273edfec809b61b5030fd278b9
Proposal to add support for storing and operating on IP Address data.
For some communities, IP and MAC addresses are a common data format. The data
I turned to StackOverflow to gauge interest in this topic. A search for "IP" on
Categorical, which is already in pandas, turned up 1,089 items.
Overall, I think there's enough interest relative to the implementation /
The proposal is to add
The type and block should be generic IP address blocks, with no
Since IPv6 addresses are 128 bits, they do not fit into a standard NumPy uint64
Each record will be composed of two uint64s. The first element
base = np.dtype([('lo', '>u8'), ('hi', '>u8')])
This is a common format for handling IPv4 and IPv6 data:
Use the lowest possible IP address as a marker. According to RFC2373,
The new user-facing
IPAddress.from_pyints(cls, values: Sequence[int]) -> 'IPAddress': """Construct an IPAddress array from a sequence of python integers. >>> IPAddress.from_pyints([10, 18446744073709551616]) <IPAddress(['0.0.0.10', '::1'])> """ IPAddress.from_str(cls, values: Sequence[str]) -> 'IPAddress': """Construct an IPAddress from a sequence of strings."""
The methods in the new
An implementation of the types and block is available at
Adding a new block type to pandas is a major change. Downstream libraries may
Some alternatives to this that exist outside of pandas:
To expand a bit on the (current) downside of alternative 2, when the pandas constructors
In : import pandas as pd In : import pandas_ip as ip In : arr = ip.IPAddress.from_pyints([1, 2]) In : arr Out: <IPAddress(['0.0.0.1', '0.0.0.2'])> In : pd.Series(arr) Out: 0 <IPAddress(['0.0.0.1', '0.0.0.2'])> dtype: object
I'd rather not have to make a subclass of Series, just to stick an array-like thing into a Series.
If pandas could provide an interface such that objects satisfying that interface
Wow, detailed proposal!
First question that comes to my mind: why is it needed to be included in pandas (from technical point of view). Or to put it differently: what is currently in
E.g. in geopandas the GeometryBlock can be stored in a Series as well, the main reason we have the subclasses GeoSeries and GeoDataFrame is to add a bunch of additional methods (but which could be solved with an accessor).
Unless I'm missing something, there isn't a good way stuff an arbitrary "thing" into the regular
In : import pandas as pd pi In : import pandas_ip as ip In : arr = ip.IPAddress.from_pyints([1, 2]) In : arr Out: <IPAddress(['0.0.0.1', '0.0.0.2'])> In : pd.Series(arr) Out: 0 <IPAddress(['0.0.0.1', '0.0.0.2'])> dtype: object
AFAICT, the only way to do this from outside pandas is to construct blocks directly and use fastpath
In : pd.Series(ip.IPBlock(arr, slice(0, 1)), pd.RangeIndex(2), fastpath=True) Out: 0 0.0.0.1 1 0.0.0.2 dtype: ip
So an alternative to my proposal would be to make something like
(edited a bug in my example).
I could imagine coming up with an interface where if an object passed to the interface satisfies it, we dispatch some of the
Then pandas can (maybe) figure out the right thing to do. To be clear, I'd be more than satisfied if we can make this solution work.
I was actually thinking about this yesterday, but in the context of
Obviously a bit of work would need to be done on IP Addresses and
Additionally, the PostgreSQL docs might be useful as an additional reference/another perspective in general:
Updated the original with some information on why doing this outside pandas is (currently) difficult, but I'd be happy to work on making that smoother.
@jschendel, yes I was just reading through https://docs.python.org/3/howto/ipaddress.html#defining-networks on this. I'm not especially familiar with the network side of things, so I'm not sure what that would look like.
And good call on using Postgres for design inspiration.
I'm not opposed to having an IP type in pandas, but does seem like it could be an interesting case to try develop an "extension block API" around, i.e., you do something like subclass
That said, I really don't know our own internal interfaces well enough to know if this is feasible without massive refactoring or even a good idea.
FWIW, I plan to experiment with defining an interface through ABCs next week. I'll update with how that turns out.…
On Thu, Dec 14, 2017 at 3:27 PM, chris-b1 ***@***.***> wrote: I'm not opposed to having an IP type in pandas, but does seem like it could be an interesting case to try develop an "extension block API" around, i.e., you do something like subclass Block and ExtensionDtype and through metaclass registration or whatever, everything works! That said, I really don't know our own internal interfaces well enough to know if this is feasible without massive refactoring or even a good idea. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#18767 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIsjr4QTqm_j2sCedSyV5WSMK7Z0Aks5tAZK4gaJpZM4RA0QJ> .
Yes, that is correct. That is also something with which I have struggled in geopandas.
For the short term, you could provide functional constructors like
BTW, the fact that it doesn't see your ip array-like as an array-like and unwraps it in a series (so getting series of length 2) feels like a bug in pandas (in
An alternative interface could be pandas checking for a
Can you explain this a bit in more detail?
I haven't (yet) implemented the methods to make that IP array an iterable.
A class (ABC or otherwise) that contains enough information for the pandas constructors to do the right thing (the
This was referenced
Jan 2, 2018
added a commit
Jan 8, 2018
added a commit
Jan 9, 2018
referenced this issue
Jan 10, 2018
added a commit
Jan 16, 2018
Yes, cyberpandas has a MACArray type. https://cyberpandas.readthedocs.io/en/latest/api.html#macarray Feel free to open an issue at https://github.com/ContinuumIO/cyberpandas if you have questions / issues.…
On Tue, Jun 19, 2018 at 5:32 PM, Mike Pennington ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> the title of this issue mentions mac-addresses; I see that cyberpandas groks IPs now, but is there a solution for mac addresses? If so, can you elaborate? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18767 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIvW-6fwHAYx8eAnpz85usHpqYp0Mks5t-XwPgaJpZM4RA0QJ> .