Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitmaps? #14

Closed
fdncred opened this issue Oct 18, 2018 · 44 comments
Closed

Bitmaps? #14

fdncred opened this issue Oct 18, 2018 · 44 comments

Comments

@fdncred
Copy link
Contributor

fdncred commented Oct 18, 2018

I'm interested in loading a bitmap in to a NumSharp array. I realize that isn't written yet but I'm concerned that if I write and contribute that method it will be way too slow to do anything with. What are your thoughts on speed?

Thanks,
Darren


EDIT:
System.Drawing.Bitmap are now supported by a separate package, read more.

@Oceania2018
Copy link
Member

@fdncred There is a sample for loading MNIST image dataset into NDArray. 10K image data, works very well.

@fdncred
Copy link
Contributor Author

fdncred commented Oct 18, 2018

@Oceania2018 Thanks for the info but there aren't a lot of 10k images that need processing. For instance, one I'm working now is 2531 x 2081 x 3 bpp, which makes for > 15.8 million bytes, in a 3 dimensional array. Obviously, anything big like that would be slower but I'm not sure if I want to find out how much slower. I may have to code it up just to see.

@Oceania2018
Copy link
Member

@fdncred Hi, I havn't test that large dataset. There is definitely way to optimize it. Like use yield keyword. Read image one by one and feed into neural network.

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

@Oceania2018 I experimented with Bitmaps and it just crashes visual studio when I try to inspect the np variable. I take that to mean that the arrays are so large that NumSharp can't handle it or I've constructed the numpy array incorrectly. I suspect I'm not using NumSharp correctly. Any ideas?

My system is pretty beefy - 16GB Ram, 12-CPUs.

This is what I did.

  1. Make your project multitargeting following this guide.
  2. Build the net472 version of numsharp and include the assembly in my other project.
  3. Add this code and the image below and set a break point at the end so I can inspect the np variable.

Note: If you uncomment the byteImage code, my code creates a 3 dimensional byte array. I use this in other places and works great. This code was meant for testing and only really handles 32-bpp and 24-bpp images.

The intent of the code was to create a 2d array of image data by following the example of Array2Dim TestMethod. I know it's not right since np[0] only has two values where it should have 3. I'd like to create a 3d array like with byteImage but I'm not sure how to do that with NumSharp.

private void BitmapToArray(string notes1a)
{
    var bmp = (System.Drawing.Bitmap)System.Drawing.Image.FromFile(notes1a);
    var bmpd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
    var dataSize = bmpd.Stride * bmpd.Height;
    byte[] data = new byte[dataSize];
    Marshal.Copy(bmpd.Scan0, data, 0, data.Length);
    bmp.UnlockBits(bmpd);

    var includeAlpha = false;
    var stride = bmpd.Stride;
    //var byteImage = new byte[bmpd.Height][][];
    var w = bmpd.Width;
    var dataLen = data.Length / 4;

    var np = new NumSharp.NDArray<List<int>>();
    var list = new List<List<int>>();

    for (int i = 0; i < dataLen; i++)
    {
        var x = i % w;
        var y = i / w;
        //if (x == 0)
        //    byteImage[y] = new byte[w][];
        var o = (y * stride + x * 4);
        if (includeAlpha)
        {
            //byteImage[y][x] = new byte[] { data[o], data[o + 3], data[o + 2], data[o + 1] };
            list.Add(new List<int>() { data[o], data[o + 3], data[o + 2], data[o + 1] });
        }
        else // FYI - Data is in BGR layout
        {
            //byteImage[y][x] = new byte[] { data[o + 3], data[o + 2], data[o + 1] };
            list.Add(new List<int>() { data[o + 3], data[o + 2], data[o + 1] });
        }
    }
    np = np.Array(list);
}

notesa1

@dotChris90
Copy link
Member

@fdncred hm I will check your code on Visual Studio Code, Windows, .NET Core 2.1 and maybe I try to use NDArray<double[]> .... somehow. I am not 100% sure if the List is best data type .... we use it in tests often because the lists (arrays) are small in tests. but it is possible to use double[]- so C# arrays instead. they have much better performance.

@dotChris90
Copy link
Member

@fdncred not sure if it is important. But what operating system you use? normal Windows?

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

Windows 10 1809 Build 17763.55 64-bit

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

This may be closer but still not right because the shape is wrong.

private void BitmapToArray(string notes1a)
{
    var bmp = (System.Drawing.Bitmap)System.Drawing.Image.FromFile(notes1a);
    var bmpd = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
    var dataSize = bmpd.Stride * bmpd.Height;
    byte[] data = new byte[dataSize];
    Marshal.Copy(bmpd.Scan0, data, 0, data.Length);
    bmp.UnlockBits(bmpd);

    var includeAlpha = false;
    var stride = bmpd.Stride;
    //var byteImage = new byte[bmpd.Height][][];
    var w = bmpd.Width;
    var h = bmpd.Height;
    var dataLen = data.Length / 4;

    var arr = new NumSharp.NDArray<NumSharp.NDArray<NumSharp.NDArray<byte>>>();
    arr.Data = new NumSharp.NDArray<NumSharp.NDArray<byte>>[h];
    for (int i = 0; i < dataLen; i++)
    {
        var x = i % w;
        var y = i / w;
        if (x == 0)
        {
            //byteImage[y] = new byte[w][];
            arr[y] = new NumSharp.NDArray<NumSharp.NDArray<byte>>();
            arr[y].Data = new NumSharp.NDArray<byte>[w];
        }
        var o = (y * stride + x * 4);
        if (includeAlpha)
        {
            //byteImage[y][x] = new byte[] { data[o], data[o + 3], data[o + 2], data[o + 1] };
            arr[y][x] = new NumSharp.NDArray<byte>();
            arr[y][x].Data = new byte[4];
            arr[y][x].Data[0] = data[o];
            arr[y][x].Data[1] = data[o + 3];
            arr[y][x].Data[2] = data[o + 2];
            arr[y][x].Data[3] = data[o + 1];
        }
        else // FYI - Data is in BGR layout
        {
            //byteImage[y][x] = new byte[] { data[o + 3], data[o + 2], data[o + 1] };
            arr[y][x] = new NumSharp.NDArray<byte>();
            arr[y][x].Data = new byte[3];
            arr[y][x].Data[0] = data[o + 3];
            arr[y][x].Data[1] = data[o + 2];
            arr[y][x].Data[2] = data[o + 1];
        }
    }
}

I'm trying to match the array from this python code. Which is shape(2531, 2081, 3).

pil_img = Image.open(filename)
img = np.array(pil_img)

@dotChris90
Copy link
Member

Understand. When at home maybe will try to extend array method for this. By the way. Thanks for show us the python code. It is important that we really match the APIs as well as possible.

@dotChris90
Copy link
Member

@fdncred u use the code from github and builit or u take the nuget package? Just to know how to support ur case best

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

I downloaded and compiled the code from Github.

That's what I meant when I said this above.

This is what I did.

1. Make your project multitargeting following this guide.
2. Build the net472 version of numsharp and include the assembly in my other project.
3. Add this code and the image below and set a break point at the end so I can inspect the np variable.

@dotChris90
Copy link
Member

Ah yes. Lol sorry my fault. Ok will test it at home.

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

No worries, thanks for your help.

@Oceania2018
Copy link
Member

@dotChris90 Do you think we should refactor NDArray class to every specific generic type? separate NDArray to NDArray<double[]> or NDArray where T is limited to value type, and change

public IList<TData> Data { get; set; }

to

public T[] Data { get; set; }

For 3 Dim will be

public class NDArray3<T> 
{
    public T[,,] Data { get; set; }
}

I thought this will definitly get the best performance.

@dotChris90
Copy link
Member

@Oceania2018 yes maybe we should consider some restructure.

Performance
I was really surprised to read that the jagged array double[][] should be faster than double[,]. On Stackoverflow it was often mentioned and in http://www.monitis.com/blog/improving-net-application-performance-part-13-arrays/ the author gave a reason for this. I am not 100% sure if this is really true - just want to mention it here.

Generic aspect
Honestly I was thinking if it is more user friendly if ( it is really just a consideration and not a 100% sure ) the generic type T is exactly our array storage (the member "Data"). We could restrict the generic type to classes which implement IList or which are children of Array and implement IList (not sure if it is possible to do multiple restriction). The users than can be 100% sure what they are construct. At moment I think it is complex for many. This complexity will be reduced. look example

var A = new NDArray<double[][]>().Array( .... );
var b = new NDArray<double[]> ().Array ( .... );

var c = A.Dot(b);

It is quite clean since you see "ok A is array of array --> so a matrix" and "ok b is array".
NDArray is at the end just a adapter class which extending the existing arrays in C# world.

So a NDArray will look like this

public partial class NDArray where TData : IList
{
public TData Data {get;set;}
}

@dotChris90
Copy link
Member

@Oceania2018 what you think? Honestly I do not want to start sth like "NDArray2" or "NDArray3" because it is not numpy API ;)

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

An alternate approach is to compile the numpy source code into a c++ dll and then p/invoke calls out of it. This is kind of what python does. Numpy isn't written in python, just the wrapper is. Then you'd have all the speed of numpy and one would have to figure out how to marshal data back and forth.

Update
I take it back. After looking at the numpy source code and libopenblas I'm not sure p/invoking would even be possible. What a mess. No wonder no one else has done it.

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

But I did find this. Looks like it could be helpful.

@Oceania2018
Copy link
Member

@dotChris90 I like the jagged array.

var A = new NDArray<double[][]>()

@dotChris90
Copy link
Member

@Oceania2018 ok - if you do not mind I would do a totally restructure at Friday and weekend (have some holidays). I suggest just one of us (so me) do this because it also include changing the unit Tests etc.

@Oceania2018
Copy link
Member

I have another idea. What about create new class named NumSharp, it will be equivalent np when you do bar np = new NumSharp(). then np.arange(10). NumSharp will act like a router.

@Oceania2018
Copy link
Member

NumSharp will hide the mass of NDArray. I agree with you. You will do the restructuring. Appreciate.

@dotChris90
Copy link
Member

@fdncred interesting. Seems NumSharp is not the only project try to reconstruct numpy lol Thanks for post. I just think that in .NET Core 3.X the .NET system will bring a lot more stuff for machine learning, array performance and so on. That is the reason I avoid using C. AT the moment. But if we find out in 2019 that .NET Core 3.0 does not bring us what we wish we will go with C maybe. And about the Numpy project. I think at moment they use their internal mechanism by including the Python.h in their files. If we want to integrate this into .NET it feels a little bit too much wrapping and we still have to implement the classes. I really would like to see if the numpy team would writing their stuff in C and compile to shared object and linking their python object code to this shared object. Anyway maybe we can have a look on their Github repo :D

@dotChris90
Copy link
Member

@Oceania2018 honestly the np = new NumSharp(). is a fantastic idea. lol this makes all stuff look more like numpy. We could try to use .NET script or Powershell and make some examples. after restructure the array stuff.

@Oceania2018
Copy link
Member

Oceania2018 commented Oct 30, 2018

@dotChris90 Sounds great, let's do it. I will add a NumSharp class, you do the NDArray restructing.
Another advantage of NumSharp is making our API more stable for high level usage. Just change NDArray implementation, better encapsulation of OOP.

var np = new NumSharp();
np.arrange(10);

@fdncred Are you interested joining this project?

Created a new issue #34

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

@dotChris90 I posted that C++ link in order to help port to C#. For me, at least, it's easier to read C/C++ and turn it into C# than it is for me to ready python and turn it into C#. Here's another C++ port of the NumPy functionality with help. Again, may just be useful seeing how other people reinterpret numpy.

@dotChris90
Copy link
Member

@fdncred I made Testmethod for your case to try and play around with this use case of byte[][][]. In Visual Studio Code the debugger for this image working fine - slow but fine. but not for our NDArray - i just tried at moment for byte[][][].

I would suggest I will do the restructure of our NDArray this week and extend Array method. I will let you know when finish. Honestly until now we did not think about Tensor types like byte[][][]. Maybe that was the reason the Shaping method does not work proper. When finish the restructure will let you know. And You can try than sth like

var myArray = new NDArray<byte[][][]>().Array(new Bitmap("pathToImage"));

For now - if you want to play around with the code now - Maybe you could try to make :

var myArray = NDArray<byte[][]>(); // so a NDArray of byte array of byte array - but it looks like matrix thats why want to restructure

@dotChris90
Copy link
Member

@fdncred sure. Honestly the link was interesting. and totally agree with you. C++ and C# are much closer to each others than Python. Even python is a nice language but ... lot of things are missing. Generics - just as example. Maybe will have a look deeper in this C++ projects

@dotChris90
Copy link
Member

@fdncred just question of curiosity. What API you suggest to implement for byte[][][]? In other words - what would be good to see for images?

@dotChris90
Copy link
Member

@fdncred @Oceania2018 I checked your link https://xtensor.readthedocs.io/en/latest/numpy.html amazing! but I asked myself - It is not possible to have an array<double,2> generic - am I right? Because this looks extreme nice for users. But I never saw this in C# or in general .NET world.

@Oceania2018
Copy link
Member

@dotChris90 I just disucssed with someone else. We have an other solution. Please hold on. Don't do any change.

@dotChris90
Copy link
Member

@Oceania2018 ok will do nothing for today. But what was discussing about the NDArray<double[][]> , the Bitmap or np = new Numsharp? :D

@Oceania2018
Copy link
Member

Oceania2018 commented Oct 30, 2018

I pushed code. Please refer NDArrayOptimized. All data should be persist in a one dimension array. NDArrayOptimized will parse the 1d array to any dim array only when data is used.

        [TestMethod]
        public void arange()
        {
            var np = new NDArrayOptimized<int>();

            np.arange(3);
            Enumerable.SequenceEqual(np.Data, new int[] { 0, 1, 2 });

            np.arange(7, 3);
            Enumerable.SequenceEqual(np.Data, new int[] { 3, 4, 5, 6 });

            np.arange(7, 3, 2);
            Enumerable.SequenceEqual(np.Data, new int[] { 3, 5 });
        }

NDArrayOptimized will return corresponding form according different Shape like shape(3, 5) by parse 1 dim array to n dim array.

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

@dotChris90 I don't understand your question about what API for byte[][][]. Sorry. Having an image in a byte[][] or byte[][][] is only useful as it relates to numpy's algorithms. If you look at this python project you'll see how they're using it. This python project is where I got the idea to use NumSharp when I converted it to C#.

@Oceania2018
Copy link
Member

@fdncred I think we would do it like this:

var np = new NDArrayOptimized<byte>();
np.reshape(n1, n2, n3);
// load image bytes into np

@fdncred
Copy link
Contributor Author

fdncred commented Oct 30, 2018

@Oceania2018 That seems intuitive to me as long as it returns np[height][width][byte[3]]. I think that's what python is doing but maybe it's returning tuples - I can never tell with imaging on python.

@dotChris90
Copy link
Member

@fdncred the project page is enough. Before usual always working with time series. not too much with images. :D So I dont know well which functions are used mostly. Just was looking for some inspiration or use cases.

@Oceania2018
Copy link
Member

Oceania2018 commented Oct 31, 2018

@fdncred I created a 1M bytes, cost 38ms.

image

image

@Oceania2018
I get similar performance.
perf
With more realistic bitmap dimensions
bmpsim
Not sure why my shape & ndim is different.
bmpsim-expand

@dotChris90
Copy link
Member

dotChris90 commented Oct 31, 2018

@Oceania2018 Do I really understand you well? So you want store everything (1D,2D,3D,...ND) in a single array? The properties like Shape decide the dimension? Do not get me wrong but this will leads to some ... problems I think.

1 ) our methods will get longer and not so well structured. Until now we can have "MethodName(NDArray< double >)" and "MethodName(NDArray<double[]>" to differ between vector and matrix. Since polymorphism we can have 2 different methods with the same name but different parameters. If our objects are always NDArray you can not make this but instead always have to do a huge "if else" structure. If method see it is vector do this, if matrix this. So this leads to less files but also increase the danger of "people have to work on 1 file at the same time".

2 ) It is not totally OOP in my opinion. In OOP we say "This is a matrix and it has this methods and properties" and this is a vector with properties and methods. But here we say it is an array - It can be anything. That is dynamic interpreter style not compiler. It is python - not C#.

3 ) Performance. Sorry I say but I am not sure if a huge array brings better performance. We should do some tests to find out best but I am very sure jagged arrays are faster than 1 huge array and you have to search the elements first with every access.

4 ) Do you really, really want to rewrite all the operations and methods? It will be hard because on Stackoverflow you will find code examples with double[][] and so on but never a example with double[] for a matrix.

5 ) Why you want to create your own Array? .NET world already has very fine and optimised ways for arrays. Python not so they developed from scratch. So we should always stay with this array type system.
Thats why I suggested NDArray<double[][]>. I know at end it is a jagged array. So I know at end the corresponding type and most important we can keep NDArray as an adapter class and not a new fancy own class - .NET does not need this.

So please give me some reason why NDArray< double > matrix = new NDArray< double >().Array(...) is better than NDArray<double[][]> matrix = new NDArray<double[][]>().Array() ?

I know the QuantStack do it and I find little bit weird since I CAN NOT SEE FROM MY CODE WHAT Numeric TYPE I HAVE. Have a look again :

var matrix = new NDArray<double[][]>() // I can see 100% it must be a matrix
var matrix = new NDArray< double >() // Is it a vector, a matrix? no! it is something we do not know -.-

So give me some arguments and pros. I do not want something like "because QuantStack do like this". I want sth like "better performance for Matrix Multiplication" because honestly all the points I listed at moment makes me feel not comfortable with the QuantStack solution.

@dotChris90
Copy link
Member

@Oceania2018 will open an other issue to discuss this. Bitmap it not the best name for this ;)

@dotChris90
Copy link
Member

@fdncred I pushed an array method to NumSharp which accepts a bitmap object as input parameter. You can try and play. :)

With the new 1D array strategy we could simple take the byte array of this Marshal. Copy method and put into NDArray Data property. Just need to set the shape as height wideth 3.

Only thing I don't understand is that the order of rgb vector is different now in numpy and Marshall. Copy. Shall we correct this?

@dotChris90
Copy link
Member

@fdncred took your image and example code for the method and unit test.

Hope u don't mind :)

@fdncred
Copy link
Contributor Author

fdncred commented Nov 2, 2018

@dotChris90 I'll take a look at it. I have no problem with you using any of the code I've pushed or put in issues, so feel free to use it without question.

The thing about dotnet bitmaps is they're stored in BGR format. So that may be why the vector is different. So, typically there's a byte swap of R & B to get them aligned properly.

I see some things I'd change but this is definitely a good start. We just have to figure how what BPP we will support and be able to handle those flavors of bitmap.

For speed purposes we could also use unsafe calls on bmp.Scan0 instead of marshaling. Marshaling isn't exactly fast, but we can decide that later.

@dotChris90
Copy link
Member

@fdncred totally agree. First let make a nice start for NumSharp.

:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants