Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'DataFrame' object has no attribute 'name' #666

Closed
islrnd opened this issue Dec 18, 2019 · 18 comments · Fixed by #681
Closed

AttributeError: 'DataFrame' object has no attribute 'name' #666

islrnd opened this issue Dec 18, 2019 · 18 comments · Fixed by #681

Comments

@islrnd
Copy link

islrnd commented Dec 18, 2019

AttributeError                            Traceback (most recent call last)
<ipython-input-32-8d38637b6cc6> in <module>
      6 
      7 oversampler=SMOTE(random_state=42)
----> 8 smote_train, smote_target = oversampler.fit_resample(X,y)
      9 
     10 print("Before OverSampling, counts of label '0', '1':", smote_target['label'].value_counts())

~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
     73         """
     74         check_classification_targets(y)
---> 75         X, y, binarize_y = self._check_X_y(X, y)
     76 
     77         self.sampling_strategy_ = check_sampling_strategy(

~\Anaconda3\lib\site-packages\imblearn\base.py in _check_X_y(self, X, y, accept_sparse)
    148         if hasattr(y, "loc"):
    149             # store information to build a series
--> 150             self._y_name = y.name
    151             self._y_dtype = y.dtype
    152         else:

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5068 
   5069     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'name'
@chkoar
Copy link
Member

chkoar commented Dec 18, 2019

@glemaitre I think that is related with the sanity check that we were discussing about, no? The user passes something that has .loc but it is not a Series as we expect. It is a DataFrame.

@glemaitre
Copy link
Member

Yes but this is not pythonic to type check. We should probably look at attributes that are "couacing" better for Series.

@glemaitre
Copy link
Member

@islrnd you should pass a numpy array or a pandas series, not a dataframe.

@00krishna
Copy link

This is a confusing issue. If I pass in a DataFrame then I get the error about no attribute name. If I pass in a Series I get a different error about ValueError: Found array with 0 feature(s) (shape=(7788867, 0)) while a minimum of 1 is required. So no matter how I am putting the data in, it generates a critical error.

@chkoar
Copy link
Member

chkoar commented Dec 18, 2019

@00krishna post a sample code to reproduce you error.

@00krishna
Copy link

00krishna commented Dec 18, 2019

@chkoar Sorry, I realized I have all categorical features and that is causing the problem.

@parth-radonc
Copy link

@00krishna I am facing the similar issue.. Can you help me with it?

@chkoar
Copy link
Member

chkoar commented Jan 13, 2020

@parth-mango You probably pass a data frame in y. You should pass a Series object. If you want provide a minimal reproducible code.

@flowersw
Copy link

I had the same issue, and was accidentally passing in a DataFrame for y instead of Series

@chkoar
Copy link
Member

chkoar commented Jan 26, 2020

@flowersw #673 or a post PR hopefully will solve this problem.

@serjko
Copy link

serjko commented Jan 28, 2020

Hi there,

I'm facing a similar issue that wasn't resolved after converting the initial DataFrame to Series. Please take a look at my question on SO https://datascience.stackexchange.com/questions/67141/passing-data-to-smote-after-applying-train-test-split

@chkoar
Copy link
Member

chkoar commented Jan 28, 2020

@serjko please post a minimal reproducible example and your package versions.

@serjko
Copy link

serjko commented Jan 28, 2020

Hi @chkoar, Thanks for jumping in so quickly. This is a false alarm. The root cause was my dataset and not SMOTE. I made it work after cleaning up the data and passing Series as 2nd variable for fit_sample. Apologies for bothering you and thanks again for the answer!

@ertanuj96
Copy link

@chkoar , Hey I am facing similar issue when I am using regex a string on entire dataframe .
#Note :: xlsx - you can ask me in private ,cannot expose xlsx here.
Code snippet here -
'import os
import time
import sys
import subprocess
import re
import pandas as pd
from openpyxl import load_workbook
import pyperclip

from tkinter import Tk
from tkinter.filedialog import askopenfilename

def get_lookup_excel_path():
parent_dir = '/root/Desktop/lookup.xlsx'
return parent_dir

def upload_spreadsheet(path, active_sheet_only = True):
'''Returns pandas dataframe. Returns empty dataframe if fails'''
#check if file exists
if not os.path.isfile(path):
return pd.DataFrame()

#check file type
file_type = os.path.splitext(path)[1]

if file_type == '.csv':
    df = pd.read_csv(path)
elif file_type in ['.xlsx','.xlsm','.xltx','.xltm']:
    wb = load_workbook(path)

    #convert sheets to pandas dataframe
    if active_sheet_only:
        df = pd.DataFrame(wb.active.values)
    else:
        #combine all sheets into one dataframe
        frames = [pd.DataFrame(sheet.values) for sheet in wb.worksheets]
        df = pd.concat(frames)
    wb.close()
else:
    return pd.Dataframe()

return df

def combine_dataframe(dataframe):
'''Converts pandas dataframe into one list'''
return [cell for cell in [dataframe[i] for i in dataframe] ]

def get_springer_books_excel():
parent_dir = '/root/Desktop/springnature.xlsx'
return parent_dir

def main():
file_path = get_lookup_excel_path()
if file_path == '': return
df = upload_spreadsheet(file_path, active_sheet_only=True)
search_values = ['Engineering', 'Computer']
df[df.name.str.contains('|'.join(search_values))]

if name == 'main':
main()

@chkoar
Copy link
Member

chkoar commented Feb 5, 2020

Hey I am facing similar issue when I am using regex a string on entire dataframe .

@ertanuj96 sorry, I do not get it. Is this related to imbalanced-learn?

@ramiazmi
Copy link

I had experienced this before. I resolved it by passing a dataframe into X and a Series into y.

@chkoar
Copy link
Member

chkoar commented Feb 16, 2020

@ramiazmi #681 has solved this issue. So 0.6.2 (which is currently not released) will solve such problems.

@glemaitre
Copy link
Member

0.6.2 is out on PyPI and will be shortly available on conda-forge. Locking this issue

@scikit-learn-contrib scikit-learn-contrib locked as resolved and limited conversation to collaborators Feb 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants