Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: About the meaning of ~ used in dfs = dfs.loc[:, ~dfs.columns.str.contains('^Unnamed', na=False)]. #43832

Closed
2 tasks done
hongyi-zhao opened this issue Oct 1, 2021 · 2 comments
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question

Comments

@hongyi-zhao
Copy link

hongyi-zhao commented Oct 1, 2021

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/a/64335734

Question about pandas

Currently, I'm using the following method to get rid of Unnamed: 0 column in a pandas DataFrame according to the comments noted here and here:

import pandas as pd

excel='2021-2022-1.xlsx'
dfs = pd.read_excel(excel)
dfs = dfs.loc[:, ~dfs.columns.str.contains('^Unnamed', na=False)]

But I am puzzled by the meaning of the ~ symbol used above. I try to find some relevant explanations by googling or digging into the source code of pandas, but still got nothing. Any hints for this question will be greatly appreciated.

Regards,
HZ

@hongyi-zhao hongyi-zhao added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Oct 1, 2021
@phofl
Copy link
Member

phofl commented Oct 1, 2021

This has nothing to do with pandas. It is a python operator

https://docs.python.org/3/reference/expressions.html#unary-arithmetic-and-bitwise-operations

@phofl phofl closed this as completed Oct 1, 2021
@hongyi-zhao hongyi-zhao changed the title QST: About the meaninf of ~ used in dfs = dfs.loc[:, ~dfs.columns.str.contains('^Unnamed', na=False)]. QST: About the meaning of ~ used in dfs = dfs.loc[:, ~dfs.columns.str.contains('^Unnamed', na=False)]. Oct 1, 2021
@hongyi-zhao
Copy link
Author

hongyi-zhao commented Oct 1, 2021

Based on your notes, I would like to add some relevant validations as follows:

In [1]: import pandas as pd
In [3]: dfs =pd.read_excel('2020-2021-2.xlsx')
In [11]: import numpy as np

In [17]: dfs.columns.str.contains('^Unnamed', na=False)
Out[17]: 
array([ True, False, False, False, False, False, False, False, False,
       False])

In [18]: ~dfs.columns.str.contains('^Unnamed', na=False)
Out[18]: 
array([False,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

In [19]: bool_arr = np.array([True, False, False, False, False, False, False, False, False, False], dtype=bool)
In [20]: ~bool_arr
Out[20]: 
array([False,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

So, the Unnamed column will be filtered out by the above expression. See here, here, here, here, and here for some additional explanations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants