Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Better Handling Nan and None Values #114

Closed
Freed-Wu opened this issue Aug 22, 2022 · 16 comments
Closed

Feature Request: Better Handling Nan and None Values #114

Freed-Wu opened this issue Aug 22, 2022 · 16 comments

Comments

@Freed-Wu
Copy link
Contributor

import plotext as plt
y = plt.sin()
import numpy as np
y[:40] = [np.nan] * 40
plt.scatter(y)
plt.show()

Expected: A plot which x range is [0, 200], but [0, 40] don't have any number.

Actual: A plot which x range is [40, 200].

@piccolomo
Copy link
Owner

Hi @Freed-Wu,

I am working on the issues. Regarding this one, no idea is coming to me on how to solve it. The function that I use to remove non numerical values litterally removes all of the non numerical values (nan or None).

non_numerical = lambda el:  el == None or (not isinstance(el, str) and math.isnan(el))

def remove_non_numerical(x, y): # it remove None and nan values but keeps strings for possible date or bar plots
   l = len(x)
   p = [i for i in range(l) if not (non_numerical(x[i]) or non_numerical(y[i]))]
   xn = [try_float(x[i]) for i in p]
   yn = [try_float(y[i]) for i in p]
   return xn, yn

any idea?

@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Sep 1, 2022

Take matplotlib as an example:

import numpy as np
from matplotlib import pyplot as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.plot(a, "b")
plt.show()

is equal to

import numpy as np
from matplotlib import pyplot as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.plot(range(0, 2), a[0:2], "b")
plt.plot(range(5, 10), a[5:10], "b")
plt.show()

a

So. I think just split the array to some parts by np.nan, and plot them
separately, which can solve this problem.

import numpy as np
import plotext as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.plot(a)
plt.show()

@piccolomo
Copy link
Owner

piccolomo commented Sep 1, 2022

It's a good idea, but I think it would still produce the same result as in your initial code / post.

@piccolomo
Copy link
Owner

Also if in your latest code you use scatter, the plot works

import numpy as np
import plotext as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.scatter(a)
plt.show()

image

@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Sep 1, 2022

It's a good idea, but I think it would still produce the same result as in your initial code / post.

import numpy as np
import plotext as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.plot(range(0, 2), a[0:2], color="blue")
plt.plot(range(5, 10), a[5:10], color="blue")
plt.show()

Screenshot from 2022-09-01 21-19-58

  1. a function to split the array to a list consisted of some arrays by np.nan
  2. for each array in the list, plot them with the corresponding x coordinates.

@piccolomo
Copy link
Owner

piccolomo commented Sep 1, 2022

If you let matplotlib deal with first code you sent, you get the same result as in plotext

import matplotlib.pyplot as plt; from plotext import sin
y = sin()
import numpy as np
y[:40] = [np.nan] * 40
plt.scatter(range(len(y)), y)
plt.show()

image

and in plotext

import plotext as plt; from plotext import sin
y = sin()
import numpy as np
y[:40] = [np.nan] * 40
plt.scatter(range(len(y)), y)
plt.show()

image

@piccolomo
Copy link
Owner

piccolomo commented Sep 1, 2022

This is a slightly different issue, here the problem is the fact that plot() perceives a single signal and so draws lines between each point, while it would be better if it jumps the nan values. I would think about this, but its not trivial.

Take matplotlib as an example:

import numpy as np
from matplotlib import pyplot as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.plot(a, "b")
plt.show()

is equal to

import numpy as np
from matplotlib import pyplot as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.plot(range(0, 2), a[0:2], "b")
plt.plot(range(5, 10), a[5:10], "b")
plt.show()

a

So. I think just split the array to some parts by np.nan, and plot them separately, which can solve this problem.

import numpy as np
import plotext as plt
a = np.arange(10.)
a[2:5] = [np.nan] * 3
plt.plot(a)
plt.show()

@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Sep 1, 2022

If you let matplotlib deal with first code you sent, you get the same result as in plotext

Oh, I am sure. It should be expected behavior of matplotlib.

This is actually a slightly different issue

Right. 😄 It should be an extended version of the first problem: nan exists in the medium of an array, not beginning or end.

@piccolomo
Copy link
Owner

Done! Somehow I managed!

image

It was not trivial but easier then I thought. I will upload the code soon

@piccolomo
Copy link
Owner

piccolomo commented Sep 1, 2022

Interestingly this solves also the first issue you posted
image

matplotlib would not handle it the same way

@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Sep 1, 2022

It was not trivial but easier then I thought. I will upload the code soon

Great!

matplotlib would not handle it the same way

Aha 😄. I'll ask a problem to matplotlib about is it a bug or an expected behavior.

@piccolomo
Copy link
Owner

piccolomo commented Sep 1, 2022

Cool thanks a lot for the report. The solution did not involve splitting data luckily (cause otherwise I would have to split all colors, style, and other parameters, which would have been a pain), but the matplotlib example you posted was useful for me to understand anyway. Basically I let nan or None "percolate" inside the calculation until its time to be drawn, where they are simply discarded. I was expecting getting lost in tedious programming, but i got lucky.

@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Sep 1, 2022

The solution did not involve splitting data luckily (cause otherwise I would have to split all colors, style, and other parameters, which would have been a pain), but the matplotlib example you posted was useful for me to understand anyway.

Great! 👍

@Freed-Wu
Copy link
Contributor Author

Freed-Wu commented Sep 1, 2022

matplotlib/matplotlib#23797 (comment)

I think we auto-scale based on valid points, if there is a nan in the y-value it is not a valid point so we are auto-scaling the x limits to show you the valid data.

It should be the expected behavior of matplotlib. Does plotext support auto-scale?

@piccolomo
Copy link
Owner

No, the new code will not update the x or y limits, it just lets the nan or None value show up to plotting time when they are discarded. The reason why the x limits are preserved at the end is because the x data has all regular values and its limits are calculated naturally.

matplotlib/matplotlib#23797 (comment)

I think we auto-scale based on valid points, if there is a nan in the y-value it is not a valid point so we are auto-scaling the x limits to show you the valid data.

It should be the expected behavior of matplotlib. Does plotext support auto-scale?

@piccolomo piccolomo changed the title [feature] Support Nan Feature Request: Better Handling Nan and None Values Sep 2, 2022
@piccolomo
Copy link
Owner

Hi @Freed-Wu , I finally updated plotext including the changes you asked for.

The new version is available on GitHib for now and soon also on PyPi. To install follow the indications here.

Your changes are documented here and you have been credited here.

Any feedback is welcomed.

Thanks a lot for your inputs and all the best,
Savino

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants