You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear developers,
I would like to suggest a following request which could be good for lots of foreign people who wants to use their foreign characters/words in pandas data frame.
In Pandas there is no possibility to sort string data in Latin or other country specific algorithm.
Although python gives already a solution for this:
import locale
locale.setlocale(locale.LC_ALL, 'hu_HU.UTF-8') # this is for Hungarian characters, but could be any other country specific sorting as well. Like 'fr_FR.UTF-8' for France, etc.
a = ["A", "E", "Z", "a", "e", "é","z" , "5","4","1", "AA", "AÁ", "ÁA", "ÁÁ", "aa", "aá", "áa", "áá"]
sorted(a)
['1', '4', '5', 'A', 'AA', 'AÁ', 'E', 'Z', 'a', 'aa', 'aá', 'e', 'z', 'ÁA', 'ÁÁ', 'áa', 'áá', 'é'] # this gives a standard English sorting. This order absolutely wrong for other, foreign countries.
sorted(a, key=locale.strxfrm) #this is the good way of sorting Latin, or Hungarian characters
['1', '4', '5', 'a', 'A', 'aa', 'AA', 'aá', 'AÁ', 'áa', 'ÁA', 'áá', 'ÁÁ', 'e', 'E', 'é', 'z', 'Z']
in pandas there Is no way to specify the sorting order;
df.sort_values() #this gives wrong sorting for Latin and other characters
Would be fine to have a function like this:
df.sort_values(key=locale.strxfrm)
I would appreciate if this features which already exists in Python will be implemented in pandas as well.
Thank you
Code Sample, a copy-pastable example if possible
Problem description
Dear developers,
I would like to suggest a following request which could be good for lots of foreign people who wants to use their foreign characters/words in pandas data frame.
In Pandas there is no possibility to sort string data in Latin or other country specific algorithm.
Although python gives already a solution for this:
import locale
locale.setlocale(locale.LC_ALL, 'hu_HU.UTF-8') # this is for Hungarian characters, but could be any other country specific sorting as well. Like 'fr_FR.UTF-8' for France, etc.
a = ["A", "E", "Z", "a", "e", "é","z" , "5","4","1", "AA", "AÁ", "ÁA", "ÁÁ", "aa", "aá", "áa", "áá"]
sorted(a)
['1', '4', '5', 'A', 'AA', 'AÁ', 'E', 'Z', 'a', 'aa', 'aá', 'e', 'z', 'ÁA', 'ÁÁ', 'áa', 'áá', 'é'] # this gives a standard English sorting. This order absolutely wrong for other, foreign countries.
sorted(a, key=locale.strxfrm) #this is the good way of sorting Latin, or Hungarian characters
['1', '4', '5', 'a', 'A', 'aa', 'AA', 'aá', 'AÁ', 'áa', 'ÁA', 'áá', 'ÁÁ', 'e', 'E', 'é', 'z', 'Z']
in pandas there Is no way to specify the sorting order;
df.sort_values() #this gives wrong sorting for Latin and other characters
Would be fine to have a function like this:
df.sort_values(key=locale.strxfrm)
I would appreciate if this features which already exists in Python will be implemented in pandas as well.
Thank you
Expected Output
df.sort_values(key=locale.strxfrm)
['1', '4', '5', 'a', 'A', 'aa', 'AA', 'aá', 'AÁ', 'áa', 'ÁA', 'áá', 'ÁÁ', 'e', 'E', 'é', 'z', 'Z']
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]The text was updated successfully, but these errors were encountered: